NVIDIA: Blackwell Generates Tokens 35x Cheaper Than Hopper — Cost per Token Is the Only Metric
Why it matters
NVIDIA has published an analysis arguing that cost per token is the only relevant metric for AI infrastructure. A comparison of the Blackwell and Hopper generations shows that Blackwell costs twice as much per GPU hour but generates 65x more tokens per second, resulting in a 35x lower cost per million tokens — $0.12 versus $4.20 for Hopper.
NVIDIA has published a detailed total cost of ownership (TCO) analysis for AI infrastructure, arguing that the industry should stop comparing GPU prices and instead focus exclusively on one metric: cost per generated token.
How Can Blackwell Be 2x More Expensive Yet 35x Cheaper?
The paradox lies in throughput. A Blackwell GPU costs roughly twice as much per hour of use as the previous Hopper generation. However, Blackwell generates 65x more tokens per second. When the cost per million generated tokens is calculated, Blackwell comes in at $0.12 versus $4.20 for Hopper — 35x cheaper.
NVIDIA uses an analogy from the trucking industry: a truck that burns twice as much fuel but carries 65 times more cargo is dramatically more efficient per kilogram of freight hauled. The same logic applies to AI inference — the absolute price of a GPU is irrelevant without the context of productivity.
What Is the “Extreme Codesign” Approach?
NVIDIA promotes the concept of “extreme codesign” — simultaneously optimizing hardware, software, and network infrastructure as a unified system. Rather than optimizing the GPU in isolation and then adapting software afterward, the Blackwell platform is designed as an integrated whole where each layer amplifies the efficiency of the others.
For organizations building or leasing AI infrastructure, the message is clear: comparing individual component specifications gives a distorted picture. The only metric that affects business outcomes is how much it costs to generate a response for an end user — and that cost is falling at an exponential rate with each new hardware generation.
This article was generated using artificial intelligence from primary sources.
Related news
Google at Cloud Next '26 unveils TPU 8i and TPU 8t: specialized chips for agentic AI computing
Gemma 4 runs as a Vision Language Agent locally on Jetson Orin Nano Super
NVIDIA and Google Cloud announce collaboration for agentic AI and physical AI on shared infrastructure