ArXiv: standard transformers with Chain-of-Thought cannot reason beyond TC^0 complexity — signpost tokens enable length-generalizable Turing simulation
A new ArXiv preprint by Kraus, Sarrof, Yaa, Koller, and Hahn shows that standard transformers with Chain-of-Thought reasoning cannot solve problems beyond TC^0 complexity under the stricter requirement of length-generalizable learning. The empirical success of CoT does not imply theoretical Turing completeness in practice. The proposed solution — dynamic vocabulary expansion plus signpost tokens — enables length-generalizable simulation of Turing machines with linear CoT overhead.
The team of Kraus, Sarrof, Yaa, Koller, and Hahn published on April 28, 2026, the preprint Barriers to Universal Reasoning With Transformers (And How to Overcome Them) — a theoretical paper with direct implications for scaling Chain-of-Thought reasoning in current-generation LLMs.
What was proven?
The main thesis: although it is known in the literature that CoT theoretically increases transformer expressiveness to Turing completeness, that claim does not hold under the stricter requirement of length-generalizable learning (the ability to solve CoT traces longer than those seen during training).
Quote from the abstract:
“Under standard positional encodings and a finite alphabet — transformers with CoT cannot solve problems beyond TC^0, i.e., the expressiveness benefits do not hold under the stricter requirement of length-generalizable learnability.”
Practical implication: many reasoning problems that appear to be solved at training lengths break down when sequence length increases. This explains why LLMs often “lose count” or lose accuracy on long arithmetic/logical chains.
Proposed solutions
The authors propose two complementary mechanisms:
1. Dynamic vocabulary scaling
The vocabulary grows with problem size. This avoids the “finite alphabet” constraint from the theorem.
2. Signpost tokens + value-change encoding
- Signpost tokens — unique identifiers assigned to each position on the simulation machine’s “tape”
- Value-change encoding — logging only state changes rather than complete states, enabling reconstruction through counting
The combination achieves the main result:
“Length-generalizable simulation of Turing machines where CoT trace length is linear in the simulated runtime with a constant factor.”
In other words: this approach breaks through the TC^0 barrier with minimal token overhead.
Empirical validation
Beyond the theoretical proof, the authors include empirical validation — signpost tokens and value-change encodings show “practical improvements in length generalization performance on complex problems.” Specific benchmarks are not in the retrieved abstract, but a fuller analysis should follow in the complete paper.
Why does this matter?
This work helps explain why scaling reasoning by simply adding more CoT tokens does not work — there is a fundamental theoretical barrier, not just a training data deficit. Implications for the next generation of LLMs:
- The architecture of Anthropic Claude, OpenAI GPT, and Gemini may require structural additions for length generalization (signpost tokens or equivalent)
- Multiplied CoT chain-of-tools approaches (such as Mistral Vibe or Anthropic Claude Code Sub-agents) may already implicitly incorporate something similar to the signpost mechanism
This paper should be tracked alongside industry announcements — if some next-gen flagship model release mentions “new positional encoding” or “dynamic vocabulary,” that is likely a response to this class of theoretical problem.
Frequently Asked Questions
- What is TC^0 complexity?
- The class of problems solvable by constant-depth parallel threshold circuits. Many natural language and logical problems (parity, iterated arithmetic over arbitrarily long inputs) lie above TC^0. Standard transformers have expressiveness bounded to TC^0 under a fixed alphabet and positional encoding.
- Why does Chain-of-Thought alone not solve the problem?
- Although CoT theoretically increases transformer expressiveness to Turing completeness, the authors prove that under the *length-generalizable* condition (ability to solve CoT traces longer than training examples), transformers cannot solve problems beyond TC^0. Practical LLMs fail on longer sequences because training length keeps expressiveness within TC^0.
- How do signpost tokens solve the problem?
- Signpost tokens assign unique identifiers to each position of the simulation machine's 'tape'. Combined with value-change encoding (logging only changes rather than complete states), they enable length-generalizable simulation of Turing machines where CoT trace length is linear in simulation runtime with a constant factor.
This article was generated using artificial intelligence from primary sources.
Sources
Related news
Anthropic closes 1M context beta for Sonnet 4.5 and Sonnet 4 — migration to 4.6 required
AstaBench Spring 2026: Claude Opus 4.7 leads with 58% in scientific AI benchmark, GPT-5.5 half the cost
PyTorch SMG: CPU-GPU disaggregation in LLM serving delivers 3.5× output throughput for Llama 3.3 70B FP8, already in production on Google Cloud, Oracle, and Alibaba