Google DeepMind Decoupled DiLoCo: 20× lower network bandwidth for AI training across geographically distributed datacenters
Why it matters
Google DeepMind has introduced Decoupled DiLoCo, a distributed architecture for training AI models. It reduces the required network bandwidth from 198 Gbps to 0.84 Gbps across 8 datacenters and achieves 88% goodput compared to 27% with conventional methods.
Google DeepMind published Decoupled DiLoCo on April 23, 2026 — a new iteration of its distributed architecture for training AI models. The headline result: required network bandwidth between datacenters drops from 198 Gbps to 0.84 Gbps for an 8-datacenter configuration, while simultaneously goodput improves from 27% to 88% in a high-failure-rate scenario.
What is DiLoCo and why was it needed?
DiLoCo (Distributed Low-Communication) is a method DeepMind introduced in 2023 and refined throughout 2024. It addresses a fundamental problem in modern AI training — the disparity in network bandwidth within and between datacenters.
Within a single datacenter, GPUs are connected by ultra-fast links (NVLink, InfiniBand) achieving hundreds of Gbps per node. But when training is to be distributed across multiple geographically separated datacenters, the bandwidth between them is 10 to 100 times smaller, and latency significantly higher.
Classical data-parallel algorithms require frequent gradient synchronization — bandwidth that exists inside a datacenter but not between them. DiLoCo addresses this with local optimization steps executed without synchronization, only occasionally exchanging accumulated gradients.
What is the “decoupled” innovation?
The new iteration introduced on April 23 introduces the concept of asynchronous islands of computation. Instead of all datacenters performing the same step at the same moment, individual “islands” advance independently and communicate only at key checkpoints.
This decoupling of the computational and communication flows dramatically reduces pressure on inter-datacenter networks. According to DeepMind’s published figures, required bandwidth drops from 198 Gbps to 0.84 Gbps — a reduction of approximately 235 times.
What are the key numbers?
DeepMind published three key metrics:
- Bandwidth: 198 Gbps → 0.84 Gbps across 8 datacenters
- Goodput (effective useful work throughput): 88% with Decoupled DiLoCo vs 27% with conventional methods, measured in a simulation of 1.2 million chips under high failure rates
- Accuracy: 64.1% with the new method vs 64.4% baseline — degradation of 0.3 percentage points
The third figure is the most important. Historically, distributed methods brought large communication gains but at the cost of significant drops in model quality. Decoupled DiLoCo practically eliminates that trade-off — the network savings come at minimal cost.
What does this mean in practice?
The implications are far-reaching. Training trillion-parameter models has until now required ultra-connected mega-datacenters or commercial clouds with specially AI-optimized fabric networks. Decoupled DiLoCo shows that the same work can be done across geographically distributed infrastructure — even infrastructure with modest network bandwidth between locations.
For the open-source AI community and smaller labs, this reduces the “compute moat” currently held by Google, Microsoft, and Meta. Projects that have access to several mid-sized GPU clusters (not necessarily co-located) can now realistically consider training competitive models.
Relation to the competition
Others are exploring similar approaches. Meta FLocal attempts to optimize distributed training through a parallel pipeline, while Anthropic TurboTrain focuses on throughput optimization within its own infrastructure. Decoupled DiLoCo, based on published numbers, appears the most aggressive in reducing network requirements.
Although this is a research publication rather than open code, Google has a practice of opening such methods through the JAX ecosystem. If that happens this time as well, open researchers will gain a powerful new tool.
This article was generated using artificial intelligence from primary sources.
Related news
Allen AI: OlmoEarth embeddings enable landscape segmentation with just 60 pixels and F1 score of 0.84
Apple at ICLR 2026 introduces ParaRNN: parallel training of nonlinear RNNs with 665× speedup
Linux Foundation publishes RGAF guide with 35 open-source tools for responsible AI