TPU (Tensor Processing Unit)

A TPU (Tensor Processing Unit) is an application-specific integrated circuit (ASIC) that Google developed specifically to accelerate machine learning workloads. Unlike a general-purpose GPU, a TPU is engineered solely for the massive matrix operations that form the core of neural network computation.

Its architecture centers on a systolic array of processing elements that perform matrix multiplication at low precision (for example 8-bit, or FP4/BF16 in newer parts), achieving high throughput and strong energy efficiency. Google has used TPUs internally since 2015 and opened them to external customers via Google Cloud in 2018. The chips are co-designed with Broadcom and fabricated by TSMC.

TPUs are central to Google’s AI infrastructure, powering both training and inference for models such as the Gemini family. Recent generations — Trillium (v6), Ironwood (v7), and the bifurcated training/inference parts announced for 2026 — keep the TPU as Google’s primary answer to Nvidia’s GPU dominance in deep learning.

Sources

See also