What exactly does Agent-World solve?

The problem of insufficient diversity and verifiability in environments for training AI agents. Instead of manually crafting benchmarks, the system autonomously discovers databases and tools and generates thousands of thematic environments with adjustable task difficulty.

How many authors worked on the paper?

The paper is signed by 20 authors led by a team from Renmin University, including Guanting Dong, Ji-Rong Wen, and Zhicheng Dou.

Agent-World: scalable environment synthesis for AI agent evolution

Q: How successful are the models?

Agent-World models of 8 billion and 14 billion parameters outperform closed commercial baselines on 23 challenging agent benchmarks.

Researchers from China’s Renmin University have published Agent-World, a framework that automatically generates diverse environments for training AI agents. The paper, with 20 authors, offers an alternative to the slow, manual creation of benchmarks and proposes co-evolution of agent and environment as the path toward more robust agent intelligence.

Why existing benchmarks are insufficient

Most current agent benchmarks are created manually — researchers define tasks, tools, and evaluation criteria themselves. This approach is expensive, slow, and poorly covers the variety of scenarios that agents (autonomous programs driven by language models) encounter in the real world. Agents trained on a narrow set of tasks struggle to generalize to new domains, and coverage of edge cases remains limited. Agent-World attacks these problems by automating the entire pipeline — from source discovery to generating verifiable tasks. This opens up the possibility of scaling training across thousands of different topics simultaneously, which was previously unthinkable without large human teams.

Two components of the framework

Agent-World rests on two main components. The first is “Environment-Task Discovery” — a module that autonomously explores databases and tool ecosystems across thousands of real-world topics and synthesizes verifiable tasks with adjustable difficulty levels from them. The second component is called “Self-Evolving Agent Training” and combines reinforcement learning (a method in which the agent learns from rewards for each action) across multiple environments with dynamic task synthesis. This part automatically identifies gaps in the agent’s skills and generates new tasks precisely where additional practice is needed. The result is co-evolution — agent and environment grow together, and tasks become progressively harder as the agent advances.

Results and implications

Agent-World models with 8 and 14 billion parameters outperform closed commercial baselines on 23 challenging agent benchmarks. The analysis shows clear scaling trends — the more diverse the environments and the more self-evolution iterations, the better the results. For the European AI community this means that high-quality agents can be trained without access to closed models from the US, provided there is infrastructure for generating environments. Agent-World suggests that the focus of agent research over the next year will shift from model design to environment design and verifiable task construction. That is a quiet but important paradigm change.

Agent-World: scalable environment synthesis for AI agent evolution from Renmin University

Agent-World: scalable environment synthesis for AI agent evolution

Why existing benchmarks are insufficient

Two components of the framework

Results and implications

Sources

Related news