arXiv:2606.25524: Cliff Tokens — single tokens that trigger failure in mathematical reasoning
Cliff tokens are individual tokens in an LLM output where the probability of successful mathematical reasoning drops sharply. Researchers developed a detection method and showed that removing the first cliff token restores accuracy to near-perfect levels, while Cliff-DPO training yields +6.6 percentage points.
This article was generated using artificial intelligence from primary sources.
What are cliff tokens?
A cliff token is a single token in a chain-of-thought output (the sequence of intermediate steps a model uses to solve a problem) where the probability of successfully reaching the correct answer drops sharply. Researchers Jaeyong Ko, Pilsung Kang, and Yukyung Lee identified these critical points through statistical analysis: a two-proportion z-test that compares the success rate of responses before and after each individual token in the sequence.
Why does a single token matter so much?
The study covered 7 models and 3 mathematical benchmarks — GSM1K, MATH500, and AIME 2025. The results are striking: removing just the first cliff token and resampling restores pass@64 (the share of correct answers in 64 attempts) to 1.0, compared to original values of 0.71–1.00 depending on the model. The difference is not trivial — it represents a transition from uncertain reasoning into a fully reliable zone.
Taxonomy and application
The authors distinguish three types of cliff tokens: deterministic (failure is inevitable), uncertain (the model hesitates), and randomly-missed (sampled-off). The key finding: optimizing on uncertain and randomly-missed cliff tokens improves reasoning, while deterministic ones do not respond to training. Building on this, the authors developed Cliff-DPO — a preference training method that achieves +6.6 percentage points of accuracy on the GSM8K benchmark, a concrete improvement without any architectural changes to the model.
Frequently Asked Questions
- What is a cliff token and why does it matter?
- A cliff token is a single token in a model's chain-of-thought output where the probability of a correct completion drops sharply — like the edge of a cliff. Identifying these points reveals precise failure mechanisms in mathematical reasoning.
- How does Cliff-DPO improve model accuracy?
- Cliff-DPO is a preference optimization method that trains the model on examples with and without cliff tokens; the result is an accuracy improvement of up to +6.6 percentage points on the GSM8K benchmark.