🤖 24 AI
🟢 🔧 Hardware Friday, April 24, 2026 · 3 min read

Google at Cloud Next '26 unveils TPU 8i and TPU 8t: specialized chips for agentic AI computing

Editorial illustration: Google TPU 8i and 8t — specialized AI chips

Why it matters

Google at Cloud Next '26 unveiled two new generations of TPU chips: TPU 8i for AI agent inference and TPU 8t for training the most complex models. The move formalizes the split of Google's TPU line into two specialized branches within the 'agentic era' of computing.

At Google Cloud Next ‘26, Google unveiled two new TPU chips — TPU 8i and TPU 8t — formally splitting its line of specialized AI processors into two parallel branches. TPU 8i targets inference for AI agents, while TPU 8t is dedicated to training the most complex models.

The announcement comes at a moment when the industry is increasingly speaking of an “agentic era” of computing — a scenario in which AI systems don’t merely respond to queries but execute long-running, multi-step tasks on behalf of users. This mode of operation demands a different hardware optimization than the classical chatbot model.

What exactly does TPU 8i do?

TPU 8i is an inference chip — designed for fast execution of already-trained models in production. Google positions it specifically as hardware for agents that must perform reasoning, planning, and multi-step workflows without noticeable wait times for the user.

Unlike classical inference where a model responds once and finishes, agentic flows generate dozens or hundreds of model calls within a single user session. Each millisecond of latency is multiplied by the number of steps, so TPU 8i aims for maximum throughput at the lowest energy per inference.

Google does not provide concrete numbers in the announcement, but emphasizes that the chip is part of a “full-stack” architecture — spanning the network, data centers, and energy-efficient operation — whose goal is “responsive agentic AI available to the masses.”

Why is a dedicated chip needed for training?

TPU 8t is optimized for training the most complex models — Google specifically highlights the ability to run “even the most complex models on a single, large unified memory pool.” This is critical because modern frontier models (hundreds of billions to trillions of parameters) no longer fit in the memory of a single accelerator and require complex distribution techniques that slow down training.

A large unified memory space per chip means less inter-chip communication during training, which in practice reduces the time and cost of training the largest models. For Google, this is also a competitive response to Nvidia’s Blackwell Ultra and AMD’s MI400 series, which target the same segment.

What does this mean for the market?

Splitting the TPU line into inference and training chips is not a new industry practice — both Nvidia and AWS already segment their accelerators similarly. But Google’s formal announcement of two chips on the same day signals that the company expects inference (agentic) to be the dominant growth segment over the next two years, while training remains important but a smaller share of total AI compute market.

For Google Cloud users this means more precise hardware selection based on workload: TPU 8i for production agentic applications, TPU 8t for research teams training their own large models. Concrete pricing, availability, and comparisons with previous TPU generations are expected in upcoming technical announcements.

🤖

This article was generated using artificial intelligence from primary sources.