Reasoning Model

A reasoning model is a large language model trained — usually via reinforcement learning on verifiable problems — to spend extended compute thinking before answering. Internally, the model produces a long chain of intermediate steps (sometimes called “thinking tokens”), often hidden from the user, then emits a concise final answer.

The paradigm went mainstream with OpenAI o1 (September 2024), followed by o3, DeepSeek R1, Anthropic Claude with extended thinking, Google Gemini Thinking, and Qwen QwQ. Reasoning models excel at math, competitive programming, scientific reasoning, and multi-step planning — domains where verification is straightforward and the model can be rewarded for correct final answers regardless of the chain.

This is sometimes framed as test-time compute scaling: instead of (only) making the model bigger, you let it think longer at inference. Empirically, doubling thinking tokens often improves accuracy on hard problems, opening a new scaling axis beyond pre-training compute.

Trade-offs:

Cost: 5-30× more output tokens than a standard answer
Latency: seconds to minutes per response
Diminishing returns: thinking longer eventually plateaus
Domain selectivity: strong gains on logic/math/code, smaller gains on open-ended writing

By 2026, every major lab ships both a “fast” model and a “reasoning” model. Routing — picking the right model per query — has become its own optimization problem.

Sources

See also