Tencent: LLM Agents Learn to Plan Ahead

Internalizing the Future is a preprint submitted to arXiv on June 25, 2026 by Xuan Zhang and eight co-authors from Tencent. The paper proposes a three-phase training approach (WM-AMT, FE-SFT, FC-RL) through which LLM agents develop a world model — the ability to generate projections of future states and evaluate plan success, rather than merely reacting.

What Is a World Model and Forward-Looking Planning

LLM agents — language models that autonomously execute tasks — typically react to the current state without internally evaluating the consequences. The lack of forward-looking planning means an agent cannot assess how good a plan is before executing it. A world model is exactly that internal mechanism: the ability to simulate possible future states of the environment, similar to mental “what would happen if I did this” experiments.

The preprint “Internalizing the Future,” submitted to arXiv on June 25, 2026 by Tencent researchers, identifies the key problem: through standard fine-tuning, LLM models develop only superficial prediction — without genuine grounding in task execution logic.

Three-Phase Training: WM-AMT, FE-SFT, and FC-RL

A team of nine Tencent authors — Xuan Zhang, Zhijian Zhou, Lingfeng Qiao, Yulei Qin, Ke Li, Xing Sun, Xiaoyu Tan, Chao Qu, and Yuan Qi — proposes training in three phases:

WM-AMT (World Model Agentic Mid-Training): in the mid-training phase, the model learns to generate “prospective state rollouts” — projections of future states — and evaluations of plan success.
FE-SFT (Format-Eliciting Supervised Fine-Tuning): through supervised fine-tuning, the model learns to structure these predictions into a consistent, usable format.
FC-RL (Foresight-Conditioned Reinforcement Learning): through reinforcement learning (RL), predictions are calibrated and made genuinely useful for making agent decisions.

Unlike reactive LLM agents that generate responses without internal simulations of the future, Internalizing the Future internalizes that process into a single autoregressive model that simultaneously predicts future states and evaluates success.

Do the Results Confirm This Approach?

Evaluation on reasoning and search tasks shows the approach consistently outperforms comparable baseline methods. Specific numerical results are not detailed in the publicly available abstract on arXiv — which is common for papers that have just been submitted and have not yet undergone independent peer review. Internalizing the Future remains a Tencent academic preprint whose broader validation is yet to come.

Frequently Asked Questions

What is a 'world model' in the context of LLM agents?

A world model is an internal mechanism by which an agent simulates possible future states of the environment before taking action — analogous to mentally running through 'what if' scenarios rather than simply reacting to the current state.

What are the three training phases proposed by Internalizing the Future?

WM-AMT (injecting predictive capabilities in the mid-training phase), FE-SFT (structuring predictions through supervised fine-tuning), and FC-RL (refining with reinforcement learning for calibrated, useful predictions).

arXiv:2606.27483: Internalizing the Future — A Unified Training Paradigm for World Model Planning in LLM Agents

What Is a World Model and Forward-Looking Planning

Three-Phase Training: WM-AMT, FE-SFT, and FC-RL

Do the Results Confirm This Approach?

Frequently Asked Questions

Sources

Related news