arXiv:2605.18703: EnvFactory – RL training of tool-use agents with 5× fewer environments
EnvFactory is a new framework for automatically synthesizing executable training environments for tool-use AI agents. Using only 85 verified environments across 7 domains, it achieves +15% on BFCLv3 and +8.6% on MCP-Atlas — roughly 5× more efficient than comparable approaches.
This article was generated using artificial intelligence from primary sources.
What is EnvFactory and why does it matter?
EnvFactory is a new framework for automatically synthesizing executable environments designed to train tool-use agents — AI systems that learn to use external tools and APIs. Researchers from multiple Asian universities addressed two chronic problems in this field: a lack of scalable execution environments and a shortage of realistic training data reflecting implicit human reasoning.
The system autonomously creates and verifies stateful tools from authentic sources, then generates multi-step training trajectories that reflect natural human query patterns — including both SFT (supervised fine-tuning) and RL (reinforcement learning) trajectories.
How is success measured?
Two key benchmarks are used for evaluation: BFCLv3 (Berkeley Function Calling Leaderboard v3), which measures how accurately a model calls functions and tools across various scenarios, and MCP-Atlas, a test suite focused on Model Context Protocol — the standardized protocol for connecting AI agents to external services.
On the Qwen3 model series, EnvFactory achieves +15% on BFCLv3 and +8.6% on MCP-Atlas. Conversational benchmarks τ²-Bench and VitaBench show an additional +6%.
Five-times efficiency — is this a real advance?
What sets EnvFactory apart from the competition is not only accuracy but efficiency: it uses just 85 verified environments across 7 domains — approximately 5× fewer than comparable approaches — and generates 2,575 training trajectories from them. Topology-aware sampling and calibrated trajectory refinement make training more robust without exponential data growth. For industry, this means: better agents, lower cost of building training infrastructure.
Frequently Asked Questions
- What is EnvFactory and what problem does it solve?
- EnvFactory is a framework for automatically creating and verifying stateful executable environments used to train tool-use AI agents. It solves two chronic problems in the field — a lack of scalable execution environments and a shortage of realistic training data that reflects implicit human reasoning patterns.
- What benchmarks does EnvFactory improve?
- On the Qwen3 model series, EnvFactory achieves +15% on BFCLv3 (Berkeley Function Calling Leaderboard v3) and +8.6% on MCP-Atlas. Conversational benchmarks τ²-Bench and VitaBench show an additional +6%.
- Why is 5× fewer environments significant?
- EnvFactory uses only 85 verified environments across 7 domains — roughly 5× fewer than comparable approaches — while generating 2,575 training trajectories. Lower infrastructure cost means better agents at a fraction of the training overhead.