A distributed method for training large language models introduced by DeepMind in 2023–24. It enables training across multiple datacenters with asynchronous communication and local optimization steps between synchronizations.

How much smaller is the network bandwidth?

From 198 Gbps to 0.84 Gbps for an 8-datacenter scenario — about 235 times less. At the same time goodput improves from 27% to 88% under high failure rates.

DeepMind Decoupled DiLoCo: 235× lower bandwidth for training

Google DeepMind published Decoupled DiLoCo on April 23, 2026 — a new iteration of its distributed architecture for training AI models. The headline result: required network bandwidth between datacenters drops from 198 Gbps to 0.84 Gbps for an 8-datacenter configuration, while simultaneously goodput improves from 27% to 88% in a high-failure-rate scenario.

What is DiLoCo and why was it needed?

DiLoCo (Distributed Low-Communication) is a method DeepMind introduced in 2023 and refined throughout 2024. It addresses a fundamental problem in modern AI training — the disparity in network bandwidth within and between datacenters.

Within a single datacenter, GPUs are connected by ultra-fast links (NVLink, InfiniBand) achieving hundreds of Gbps per node. But when training is to be distributed across multiple geographically separated datacenters, the bandwidth between them is 10 to 100 times smaller, and latency significantly higher.

Classical data-parallel algorithms require frequent gradient synchronization — bandwidth that exists inside a datacenter but not between them. DiLoCo addresses this with local optimization steps executed without synchronization, only occasionally exchanging accumulated gradients.

What is the “decoupled” innovation?

The new iteration introduced on April 23 introduces the concept of asynchronous islands of computation. Instead of all datacenters performing the same step at the same moment, individual “islands” advance independently and communicate only at key checkpoints.

This decoupling of the computational and communication flows dramatically reduces pressure on inter-datacenter networks. According to DeepMind’s published figures, required bandwidth drops from 198 Gbps to 0.84 Gbps — a reduction of approximately 235 times.

What are the key numbers?

DeepMind published three key metrics:

Bandwidth: 198 Gbps → 0.84 Gbps across 8 datacenters
Goodput (effective useful work throughput): 88% with Decoupled DiLoCo vs 27% with conventional methods, measured in a simulation of 1.2 million chips under high failure rates
Accuracy: 64.1% with the new method vs 64.4% baseline — degradation of 0.3 percentage points

The third figure is the most important. Historically, distributed methods brought large communication gains but at the cost of significant drops in model quality. Decoupled DiLoCo practically eliminates that trade-off — the network savings come at minimal cost.

What does this mean in practice?

The implications are far-reaching. Training trillion-parameter models has until now required ultra-connected mega-datacenters or commercial clouds with specially AI-optimized fabric networks. Decoupled DiLoCo shows that the same work can be done across geographically distributed infrastructure — even infrastructure with modest network bandwidth between locations.

For the open-source AI community and smaller labs, this reduces the “compute moat” currently held by Google, Microsoft, and Meta. Projects that have access to several mid-sized GPU clusters (not necessarily co-located) can now realistically consider training competitive models.

Relation to the competition

Others are exploring similar approaches. Meta FLocal attempts to optimize distributed training through a parallel pipeline, while Anthropic TurboTrain focuses on throughput optimization within its own infrastructure. Decoupled DiLoCo, based on published numbers, appears the most aggressive in reducing network requirements.

Although this is a research publication rather than open code, Google has a practice of opening such methods through the JAX ecosystem. If that happens this time as well, open researchers will gain a powerful new tool.

Google DeepMind Decoupled DiLoCo: 20× lower network bandwidth for AI training across geographically distributed datacenters

What is DiLoCo and why was it needed?

What is the “decoupled” innovation?

What are the key numbers?

What does this mean in practice?

Relation to the competition

Sources

Related news