🟡 🤖 Models Thursday, April 30, 2026 · 3 min read ·

ArXiv: standard transformers with Chain-of-Thought cannot reason beyond TC^0 complexity — signpost tokens enable length-generalizable Turing simulation

Editorial illustration: transformer architecture with a break in the Chain-of-Thought chain and signpost symbols

A new ArXiv preprint by Kraus, Sarrof, Yaa, Koller, and Hahn shows that standard transformers with Chain-of-Thought reasoning cannot solve problems beyond TC^0 complexity under the stricter requirement of length-generalizable learning. The empirical success of CoT does not imply theoretical Turing completeness in practice. The proposed solution — dynamic vocabulary expansion plus signpost tokens — enables length-generalizable simulation of Turing machines with linear CoT overhead.

The team of Kraus, Sarrof, Yaa, Koller, and Hahn published on April 28, 2026, the preprint Barriers to Universal Reasoning With Transformers (And How to Overcome Them) — a theoretical paper with direct implications for scaling Chain-of-Thought reasoning in current-generation LLMs.

What was proven?

The main thesis: although it is known in the literature that CoT theoretically increases transformer expressiveness to Turing completeness, that claim does not hold under the stricter requirement of length-generalizable learning (the ability to solve CoT traces longer than those seen during training).

Quote from the abstract:

“Under standard positional encodings and a finite alphabet — transformers with CoT cannot solve problems beyond TC^0, i.e., the expressiveness benefits do not hold under the stricter requirement of length-generalizable learnability.”

Practical implication: many reasoning problems that appear to be solved at training lengths break down when sequence length increases. This explains why LLMs often “lose count” or lose accuracy on long arithmetic/logical chains.

Proposed solutions

The authors propose two complementary mechanisms:

1. Dynamic vocabulary scaling

The vocabulary grows with problem size. This avoids the “finite alphabet” constraint from the theorem.

2. Signpost tokens + value-change encoding

  • Signpost tokens — unique identifiers assigned to each position on the simulation machine’s “tape”
  • Value-change encoding — logging only state changes rather than complete states, enabling reconstruction through counting

The combination achieves the main result:

“Length-generalizable simulation of Turing machines where CoT trace length is linear in the simulated runtime with a constant factor.”

In other words: this approach breaks through the TC^0 barrier with minimal token overhead.

Empirical validation

Beyond the theoretical proof, the authors include empirical validation — signpost tokens and value-change encodings show “practical improvements in length generalization performance on complex problems.” Specific benchmarks are not in the retrieved abstract, but a fuller analysis should follow in the complete paper.

Why does this matter?

This work helps explain why scaling reasoning by simply adding more CoT tokens does not work — there is a fundamental theoretical barrier, not just a training data deficit. Implications for the next generation of LLMs:

  • The architecture of Anthropic Claude, OpenAI GPT, and Gemini may require structural additions for length generalization (signpost tokens or equivalent)
  • Multiplied CoT chain-of-tools approaches (such as Mistral Vibe or Anthropic Claude Code Sub-agents) may already implicitly incorporate something similar to the signpost mechanism

This paper should be tracked alongside industry announcements — if some next-gen flagship model release mentions “new positional encoding” or “dynamic vocabulary,” that is likely a response to this class of theoretical problem.

Frequently Asked Questions

What is TC^0 complexity?
The class of problems solvable by constant-depth parallel threshold circuits. Many natural language and logical problems (parity, iterated arithmetic over arbitrarily long inputs) lie above TC^0. Standard transformers have expressiveness bounded to TC^0 under a fixed alphabet and positional encoding.
Why does Chain-of-Thought alone not solve the problem?
Although CoT theoretically increases transformer expressiveness to Turing completeness, the authors prove that under the *length-generalizable* condition (ability to solve CoT traces longer than training examples), transformers cannot solve problems beyond TC^0. Practical LLMs fail on longer sequences because training length keeps expressiveness within TC^0.
How do signpost tokens solve the problem?
Signpost tokens assign unique identifiers to each position of the simulation machine's 'tape'. Combined with value-change encoding (logging only changes rather than complete states), they enable length-generalizable simulation of Turing machines where CoT trace length is linear in simulation runtime with a constant factor.
🤖

This article was generated using artificial intelligence from primary sources.