🟢 🤝 Agents Published: · 2 min read ·

arXiv:2605.16238: LLM-guided tree search beats CDC in epidemic forecasting

arXiv:2605.16238 ↗

Editorial illustration: arXiv:2605.16238 presents an autonomous system combining LLMs and tree search algorithms for predicting seasonal epidemics

arXiv:2605.16238 presents an autonomous system combining LLMs and tree search algorithms for predicting seasonal epidemics. In real time, throughout the 2025-26 season, the system independently built models for influenza, COVID-19 and RSV that consistently matched or surpassed the CDC's gold-standard human-curated ensemble.

🤖

This article was generated using artificial intelligence from primary sources.

A machine that predicts epidemics — without experts in the loop

Researchers from the University of Massachusetts published a paper describing an autonomous system for predicting respiratory epidemics. Instead of manually tuning statistical models, the system uses LLM-guided tree search — a large language model iteratively generates, tests and optimizes executable forecasting code, just as a computer searches a tree of moves in chess.

Tree search systematically explores the space of possible solutions by branching and pruning poor branches. Ensemble forecasting combines multiple models whose averaged result surpasses each individual model — which is exactly how the CDC’s gold-standard system, manually curated by experts, also works.

Real-time results: influenza, COVID-19, RSV

The key difference of this paper from laboratory benchmark studies is prospective evaluation — the system operated in real time throughout the entire 2025-26 respiratory season in the United States. It autonomously built models for three pathogens: influenza, COVID-19 and RSV (Respiratory Syncytial Virus). In all cases it consistently matched or surpassed the CDC hub ensemble.

Particularly significant is the success on RSV, where available data is sparse because systematic monitoring of that disease is relatively recent. Retrospective ablation analyses showed that optimizing log-scale metrics prevents reward hacking — a situation where the model “cheats” the optimization signal instead of genuinely learning to forecast.

What does this mean for public health?

Manual construction of forecasting models is a bottleneck that slows the response to new pathogens. This paper demonstrates that LLM agents can automate that work at the level of an expert team — faster and more scalably. If the approach is confirmed across multiple seasons, it could change the way healthcare systems plan for epidemic preparedness.

Frequently Asked Questions

How does the system predict an epidemic?
An LLM iteratively generates, evaluates and optimizes executable code for disease forecasting guided by a tree search algorithm — similar to how a computer searches a tree of possible moves in chess — with an automated judge that discards poor solutions and log-scale metrics that prevent reward hacking.
How large is the improvement over the CDC?
The machine-generated ensemble consistently matched or outperformed the CDC hub ensemble — the gold standard manually curated by experts — in a prospective, real-time evaluation throughout the entire 2025-26 respiratory season for influenza, COVID-19 and RSV.
What is RSV and why is it particularly challenging to forecast?
RSV (Respiratory Syncytial Virus) is a respiratory pathogen that particularly affects infants and the elderly. Forecasting is harder than for influenza because there is less historical data — the system successfully handled this data-scarce scenario.