arXiv: AutoTTS automatically discovers TTS strategies for $39.9

AutoTTS is a framework that automatically discovers test-time scaling strategies instead of manually designing heuristics. LLM controllers analyze reasoning trajectories and choose one of five actions: branch, continue, probe, prune, or stop. The discovery process on mathematical benchmarks cost only $39.9 and 160 minutes.

A research team of thirteen authors — including Tong Zheng, Haolin Liu and Chengsong Huang — presented on May 11, 2026 the AutoTTS framework that automatically discovers test-time scaling strategies instead of manual design. The paper is available at arXiv:2605.08083, with code and data to be open-sourced via GitHub.

What is AutoTTS and how does it work?

AutoTTS formulates test-time scaling as “controller synthesis over pre-collected reasoning trajectories and probe signals.” An LLM controller analyzes the reasoning trajectory at each step and chooses one of five actions: branch, continue, probe, prune, or stop. The framework uses beta parameterization for a tractable search space and fine-grained execution trace feedback to diagnose why individual strategies fail.

What results does it achieve on mathematical tasks?

On mathematical reasoning benchmarks, discovered strategies achieved “improved overall accuracy-cost tradeoff” compared to strong manually designed baselines. Crucially, learned strategies generalize to held-out benchmarks and different model sizes — discovery does not need to be repeated for each new configuration.

How much does the entire discovery process cost?

The authors report that the discovery process required only $39.9 and 160 minutes of compute. This is drastically lower than the costs of manually engineering similar strategies, where research teams spend months experimenting with heuristics like best-of-N sampling or Monte Carlo tree search.

What does this mean for future AI agents?

AutoTTS shifts the paradigm from “engineer designs reasoning strategy” to “agents discover their own strategies.” For multi-step agentic systems this means each task class can obtain its own optimized TTS strategy through an automated pipeline. The authors announce an open-source release so the broader community can replicate and build upon the results.

Frequently Asked Questions

What is test-time scaling (TTS)?

Test-time scaling refers to techniques that improve LLM response quality by investing more compute resources at inference time — for example by generating multiple candidate trajectories, verifying intermediate steps, or adaptively deciding when to deepen reasoning. Manually designed heuristics (best-of-N, MCTS) have been dominant until now.

How does AutoTTS differ from manually designed TTS strategies?

Instead of an engineer manually deciding when to branch or stop reasoning, AutoTTS formulates the problem as 'controller synthesis' over pre-collected reasoning trajectories. Controllers choose five actions (branch, continue, probe, prune, stop) with beta parameterization for a tractable search space and fine-grained execution trace feedback.

Is the $39.9 cost realistic for other applications?

The cost applies to the discovery process on the mathematical benchmarks described in the paper. Discovered strategies generalized to held-out benchmarks and different model sizes without new expensive discovery, suggesting $39.9 is a one-time investment per task class.

arXiv:2605.08083: AutoTTS — agentic discovery test-time scaling strategy for just $39.9 compute

What is AutoTTS and how does it work?

What results does it achieve on mathematical tasks?

How much does the entire discovery process cost?

What does this mean for future AI agents?

Frequently Asked Questions

Sources

Related news