Hardware

AI accelerator (NPU/TPU)

Specialized chip for AI workloads — NPUs in phones, Google TPUs, AWS Trainium — often faster and more cost-efficient than GPUs per dollar spent.

An AI accelerator is a chip purpose-built for neural networks. Unlike GPUs, which evolved from graphics, accelerators are optimized from day one for matrix multiplication, low-precision numerics (FP8, INT8, INT4), tensor operations, and specific memory access patterns.

Main categories:

  • TPU (Tensor Processing Unit) — Google’s chip, used to train Gemini internally and exposed via Google Cloud; current generations are TPU v5p and TPU v6e (Trillium)
  • NPU (Neural Processing Unit) — the term for on-device accelerators in phones, laptops, and edge devices; Apple Neural Engine, Qualcomm Hexagon NPU, Intel/AMD NPUs in Copilot+ PCs
  • AWS Trainium / Inferentia — Amazon’s training and inference chips, aggressively priced against NVIDIA in AWS
  • Specialized LLM chips — Groq LPU, Cerebras WSE, SambaNova RDU, all designed for extreme inference throughput

The market logic is clear: GPUs are expensive, scarce, and (until recently) nearly 100% NVIDIA. Hyperscalers (Google, Amazon, Meta, Microsoft) build their own accelerators to reduce dependence and the margins they pay to NVIDIA. On-device, an NPU in every modern phone and laptop makes it possible to run small large language models locally without sending data to the cloud.

The bottleneck is software: CUDA and NVIDIA’s ecosystem remain the gold standard, while alternative stacks are still maturing.

Sources

See also