Agent-World: scalable environment synthesis for AI agent evolution from Renmin University
Why it matters
Agent-World is a new research framework from China's Renmin University that automatically generates thousands of diverse environments for training AI agents. It replaces manually crafted benchmarks with dynamic scenarios and enables evolutionary learning through co-evolution of agent and environment.
Agent-World: scalable environment synthesis for AI agent evolution
Researchers from China’s Renmin University have published Agent-World, a framework that automatically generates diverse environments for training AI agents. The paper, with 20 authors, offers an alternative to the slow, manual creation of benchmarks and proposes co-evolution of agent and environment as the path toward more robust agent intelligence.
Why existing benchmarks are insufficient
Most current agent benchmarks are created manually — researchers define tasks, tools, and evaluation criteria themselves. This approach is expensive, slow, and poorly covers the variety of scenarios that agents (autonomous programs driven by language models) encounter in the real world. Agents trained on a narrow set of tasks struggle to generalize to new domains, and coverage of edge cases remains limited. Agent-World attacks these problems by automating the entire pipeline — from source discovery to generating verifiable tasks. This opens up the possibility of scaling training across thousands of different topics simultaneously, which was previously unthinkable without large human teams.
Two components of the framework
Agent-World rests on two main components. The first is “Environment-Task Discovery” — a module that autonomously explores databases and tool ecosystems across thousands of real-world topics and synthesizes verifiable tasks with adjustable difficulty levels from them. The second component is called “Self-Evolving Agent Training” and combines reinforcement learning (a method in which the agent learns from rewards for each action) across multiple environments with dynamic task synthesis. This part automatically identifies gaps in the agent’s skills and generates new tasks precisely where additional practice is needed. The result is co-evolution — agent and environment grow together, and tasks become progressively harder as the agent advances.
Results and implications
Agent-World models with 8 and 14 billion parameters outperform closed commercial baselines on 23 challenging agent benchmarks. The analysis shows clear scaling trends — the more diverse the environments and the more self-evolution iterations, the better the results. For the European AI community this means that high-quality agents can be trained without access to closed models from the US, provided there is infrastructure for generating environments. Agent-World suggests that the focus of agent research over the next year will shift from model design to environment design and verifiable task construction. That is a quiet but important paradigm change.
This article was generated using artificial intelligence from primary sources.
Related news
Anthropic: Memory for Managed Agents in public beta — AI agents that remember context between sessions
GitHub: Cloud agent sessions now available directly from issues and project views
ArXiv SWE-chat — a dataset of real developer interactions with AI coding agents in production