What are 'reasoning skills' in this paper?

The authors define reasoning skills as reusable reasoning patterns distilled from extensive exploration over harder tasks. Instead of the model building a chain of thought from scratch each time, it retrieves a relevant skill and uses it as a starting structure.

Why does this matter for deployment costs?

Reasoning models typically consume a large number of tokens generating chain-of-thought traces. By reducing the number of tokens per query while simultaneously improving accuracy, this method directly lowers operational costs for production systems that use reasoning models.

Thinking with Reasoning Skills: fewer tokens, more accuracy

On April 24, 2026, the paper “Thinking with Reasoning Skills: Fewer Tokens, More Accuracy” was published on ArXiv, accepted for the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Industry Track. The authors are Guangxiang Zhao, Qilong Shi, Xusen Xiao, Xiangzheng Zhang, Tong Yang, and Lin Sun.

The paper addresses one of the best-known problems in modern reasoning model generation: high token consumption from generating chains of thought (chain-of-thought), which directly affects inference latency and cost in production.

What do the authors propose?

Instead of the traditional paradigm in which a reasoning model generates a chain of thought from scratch each time, the authors propose that the model retrieves reusable reasoning patterns — “reasoning skills” — from a pre-built knowledge base.

These skills are distilled through extensive exploration over harder tasks: the model generates many reasoning traces, from which structured patterns are abstracted that function as “reasoning templates.” At inference time on a new problem, the system identifies a relevant skill and uses it as a starting point.

The result is a dual advantage — reduced token consumption (because the model does not have to build the full logical structure from the beginning) and increased accuracy (because patterns that have already proven successful are applied).

How does this differ from RAG or in-context learning?

At first glance, the approach resembles retrieval-augmented generation (RAG), but the difference is fundamental: RAG retrieves facts or documents, whereas here what is retrieved is an abstract structured reasoning pattern.

It also differs from in-context learning with few-shot examples. Few-shot prompting gives the model concrete examples of solved tasks, while reasoning skills represent generalized meta-strategies — the way a certain class of problem is approached, without concrete numbers or input values.

The authors argue this is closer to how a human expert solves familiar types of problems: rather than re-deriving everything from scratch, they recognize the pattern and apply a proven solution structure.

On which tasks was the method evaluated?

The paper focuses on coding and mathematical reasoning, two domains in which reasoning models are most commonly used in production today. The authors show that retrieving skills outperforms conventional from-scratch reasoning in both aspects — the number of tokens consumed and the accuracy of the final answer.

Concrete numerical results are available in the full paper text, but the key claim is qualitative: the method shifts the Pareto frontier of efficiency, enabling models to be simultaneously cheaper and more accurate.

Why does this matter for AI development teams?

Reasoning models such as OpenAI GPT-5.5, Anthropic Opus 4.7, and DeepSeek V4 — released the same day — typically have 3 to 10 times higher token consumption than non-reasoning models. This directly affects operational costs for chatbots, copilot tools, and agentic systems.

An approach that simultaneously reduces token count and increases accuracy is rare in the literature — most optimizations trade one off against the other. If results are reproduced in independent experiments, integration into the next generation of production reasoning models is expected, likely through layered agentic frameworks.

For teams building AI copilot tools for business users — where every reasoning model call is costly — techniques like these are potentially transformative. The Industry Track placement at ACL confirms the paper has direct industrial applicability, not merely academic value.

Thinking with Reasoning Skills (ACL 2026 Industry Track): fewer tokens, higher accuracy through retrieval of reasoning skills

What do the authors propose?

How does this differ from RAG or in-context learning?

On which tasks was the method evaluated?

Why does this matter for AI development teams?

Sources

Related news