Hardware
AI accelerator (NPU/TPU)
Specialized chip for AI workloads — NPUs in phones, Google TPUs, AWS Trainium — often faster and more cost-efficient than GPUs per dollar spent.
An AI accelerator is a chip purpose-built for neural networks. Unlike GPUs, which evolved from graphics, accelerators are optimized from day one for matrix multiplication, low-precision numerics (FP8, INT8, INT4), tensor operations, and specific memory access patterns.
Main categories:
- TPU (Tensor Processing Unit) — Google’s chip, used to train Gemini internally and exposed via Google Cloud; current generations are TPU v5p and TPU v6e (Trillium)
- NPU (Neural Processing Unit) — the term for on-device accelerators in phones, laptops, and edge devices; Apple Neural Engine, Qualcomm Hexagon NPU, Intel/AMD NPUs in Copilot+ PCs
- AWS Trainium / Inferentia — Amazon’s training and inference chips, aggressively priced against NVIDIA in AWS
- Specialized LLM chips — Groq LPU, Cerebras WSE, SambaNova RDU, all designed for extreme inference throughput
The market logic is clear: GPUs are expensive, scarce, and (until recently) nearly 100% NVIDIA. Hyperscalers (Google, Amazon, Meta, Microsoft) build their own accelerators to reduce dependence and the margins they pay to NVIDIA. On-device, an NPU in every modern phone and laptop makes it possible to run small large language models locally without sending data to the cloud.
The bottleneck is software: CUDA and NVIDIA’s ecosystem remain the gold standard, while alternative stacks are still maturing.