AI accelerator (NPU/TPU)

An AI accelerator is a chip purpose-built for neural networks. Unlike GPUs, which evolved from graphics, accelerators are optimized from day one for matrix multiplication, low-precision numerics (FP8, INT8, INT4), tensor operations, and specific memory access patterns.

Main categories:

TPU (Tensor Processing Unit) — Google’s chip, used to train Gemini internally and exposed via Google Cloud; current generations are TPU v5p and TPU v6e (Trillium)
NPU (Neural Processing Unit) — the term for on-device accelerators in phones, laptops, and edge devices; Apple Neural Engine, Qualcomm Hexagon NPU, Intel/AMD NPUs in Copilot+ PCs
AWS Trainium / Inferentia — Amazon’s training and inference chips, aggressively priced against NVIDIA in AWS
Specialized LLM chips — Groq LPU, Cerebras WSE, SambaNova RDU, all designed for extreme inference throughput

The market logic is clear: GPUs are expensive, scarce, and (until recently) nearly 100% NVIDIA. Hyperscalers (Google, Amazon, Meta, Microsoft) build their own accelerators to reduce dependence and the margins they pay to NVIDIA. On-device, an NPU in every modern phone and laptop makes it possible to run small large language models locally without sending data to the cloud.

The bottleneck is software: CUDA and NVIDIA’s ecosystem remain the gold standard, while alternative stacks are still maturing.

Sources

See also