Models

Reasoning Model

An LLM trained to produce a long, deliberate chain of thought before its final answer, trading inference time for accuracy on complex problems.

A reasoning model is a large language model trained — usually via reinforcement learning on verifiable problems — to spend extended compute thinking before answering. Internally, the model produces a long chain of intermediate steps (sometimes called “thinking tokens”), often hidden from the user, then emits a concise final answer.

The paradigm went mainstream with OpenAI o1 (September 2024), followed by o3, DeepSeek R1, Anthropic Claude with extended thinking, Google Gemini Thinking, and Qwen QwQ. Reasoning models excel at math, competitive programming, scientific reasoning, and multi-step planning — domains where verification is straightforward and the model can be rewarded for correct final answers regardless of the chain.

This is sometimes framed as test-time compute scaling: instead of (only) making the model bigger, you let it think longer at inference. Empirically, doubling thinking tokens often improves accuracy on hard problems, opening a new scaling axis beyond pre-training compute.

Trade-offs:

  • Cost: 5-30× more output tokens than a standard answer
  • Latency: seconds to minutes per response
  • Diminishing returns: thinking longer eventually plateaus
  • Domain selectivity: strong gains on logic/math/code, smaller gains on open-ended writing

By 2026, every major lab ships both a “fast” model and a “reasoning” model. Routing — picking the right model per query — has become its own optimization problem.

Sources

See also