Cliff Tokens: why one token breaks an LLM

Cliff tokens are individual tokens in an LLM output where the probability of successful mathematical reasoning drops sharply. Researchers developed a detection method and showed that removing the first cliff token restores accuracy to near-perfect levels, while Cliff-DPO training yields +6.6 percentage points.

What are cliff tokens?

A cliff token is a single token in a chain-of-thought output (the sequence of intermediate steps a model uses to solve a problem) where the probability of successfully reaching the correct answer drops sharply. Researchers Jaeyong Ko, Pilsung Kang, and Yukyung Lee identified these critical points through statistical analysis: a two-proportion z-test that compares the success rate of responses before and after each individual token in the sequence.

Why does a single token matter so much?

The study covered 7 models and 3 mathematical benchmarks — GSM1K, MATH500, and AIME 2025. The results are striking: removing just the first cliff token and resampling restores pass@64 (the share of correct answers in 64 attempts) to 1.0, compared to original values of 0.71–1.00 depending on the model. The difference is not trivial — it represents a transition from uncertain reasoning into a fully reliable zone.

Taxonomy and application

The authors distinguish three types of cliff tokens: deterministic (failure is inevitable), uncertain (the model hesitates), and randomly-missed (sampled-off). The key finding: optimizing on uncertain and randomly-missed cliff tokens improves reasoning, while deterministic ones do not respond to training. Building on this, the authors developed Cliff-DPO — a preference training method that achieves +6.6 percentage points of accuracy on the GSM8K benchmark, a concrete improvement without any architectural changes to the model.

Frequently Asked Questions

What is a cliff token and why does it matter?

A cliff token is a single token in a model's chain-of-thought output where the probability of a correct completion drops sharply — like the edge of a cliff. Identifying these points reveals precise failure mechanisms in mathematical reasoning.

How does Cliff-DPO improve model accuracy?

Cliff-DPO is a preference optimization method that trains the model on examples with and without cliff tokens; the result is an accuracy improvement of up to +6.6 percentage points on the GSM8K benchmark.

arXiv:2606.25524: Cliff Tokens — single tokens that trigger failure in mathematical reasoning

What are cliff tokens?

Why does a single token matter so much?

Taxonomy and application

Frequently Asked Questions

Sources

Related news