EnvFactory: 5× leaner RL training for tool-use agents

EnvFactory is a new framework for automatically synthesizing executable training environments for tool-use AI agents. Using only 85 verified environments across 7 domains, it achieves +15% on BFCLv3 and +8.6% on MCP-Atlas — roughly 5× more efficient than comparable approaches.

What is EnvFactory and why does it matter?

EnvFactory is a new framework for automatically synthesizing executable environments designed to train tool-use agents — AI systems that learn to use external tools and APIs. Researchers from multiple Asian universities addressed two chronic problems in this field: a lack of scalable execution environments and a shortage of realistic training data reflecting implicit human reasoning.

The system autonomously creates and verifies stateful tools from authentic sources, then generates multi-step training trajectories that reflect natural human query patterns — including both SFT (supervised fine-tuning) and RL (reinforcement learning) trajectories.

How is success measured?

Two key benchmarks are used for evaluation: BFCLv3 (Berkeley Function Calling Leaderboard v3), which measures how accurately a model calls functions and tools across various scenarios, and MCP-Atlas, a test suite focused on Model Context Protocol — the standardized protocol for connecting AI agents to external services.

On the Qwen3 model series, EnvFactory achieves +15% on BFCLv3 and +8.6% on MCP-Atlas. Conversational benchmarks τ²-Bench and VitaBench show an additional +6%.

Five-times efficiency — is this a real advance?

What sets EnvFactory apart from the competition is not only accuracy but efficiency: it uses just 85 verified environments across 7 domains — approximately 5× fewer than comparable approaches — and generates 2,575 training trajectories from them. Topology-aware sampling and calibrated trajectory refinement make training more robust without exponential data growth. For industry, this means: better agents, lower cost of building training infrastructure.

Frequently Asked Questions

What is EnvFactory and what problem does it solve?

EnvFactory is a framework for automatically creating and verifying stateful executable environments used to train tool-use AI agents. It solves two chronic problems in the field — a lack of scalable execution environments and a shortage of realistic training data that reflects implicit human reasoning patterns.

What benchmarks does EnvFactory improve?

On the Qwen3 model series, EnvFactory achieves +15% on BFCLv3 (Berkeley Function Calling Leaderboard v3) and +8.6% on MCP-Atlas. Conversational benchmarks τ²-Bench and VitaBench show an additional +6%.

Why is 5× fewer environments significant?

EnvFactory uses only 85 verified environments across 7 domains — roughly 5× fewer than comparable approaches — while generating 2,575 training trajectories. Lower infrastructure cost means better agents at a fraction of the training overhead.

arXiv:2605.18703: EnvFactory – RL training of tool-use agents with 5× fewer environments

What is EnvFactory and why does it matter?

How is success measured?

Five-times efficiency — is this a real advance?

Frequently Asked Questions

Sources

Related news