🟡 🤝 Agents Saturday, May 2, 2026 · 2 min read ·

Microsoft Research Synthetic Computers: 1,000 synthetic computers as a substrate for long-horizon training of productive AI agents

Editorial illustration: mreža sintetičkih radnih stanica u digitalnom prostoru

Microsoft Research presents a methodology for generating 1,000 realistic synthetic computing environments with authentic folder hierarchies and documents. Two agents collaborate within each environment — one creates productive goals specific to the user profile, the other executes them through sequences averaging 2,000+ steps and 8+ hours of agent work. The authors claim the approach can scale to billions of synthetic worlds and represents a foundational substrate for agent self-improvement.

Researchers at Microsoft Research (Tao Ge, Baolin Peng, Hao Cheng, Jianfeng Gao) have presented a methodology for generating realistic synthetic computing environments that simulate productive workflows beyond the reach of existing agent benchmarks. The goal of the paper is infrastructural — to provide a substrate on which long-horizon agents can be trained and evaluated at a scale that exceeds manually curated datasets.

What exactly are “synthetic computers”?

Each synthetic computer contains an authentic folder hierarchy with documents and artifacts that mimics a real working environment — not a sterile sandbox, but a file system with contextually connected content. Two agents collaborate within that space:

  • The Generator creates productive goals specific to the environment’s user profile, tasks that “require multiple professional deliverables and approximately one month of human work.”
  • The Executor takes on the role of the user and works through the computer — navigating files, creating new artifacts, and consistently executing tasks over a long time period.

What are the numbers?

Three metric values define the scope of the work:

  • 1,000 synthetic computers generated in the study
  • On average over 2,000 steps per simulation
  • More than 8 hours of agent work to complete one simulation

This is a significant departure from prior agent benchmarks like SWE-Bench, GAIA, or AgentBench, where individual tasks rarely exceed a few dozen steps. The long-horizon dimension is key — the authors are targeting a class of agents that can work autonomously for hours without human intervention.

How far can it scale?

The main thesis of the paper is infrastructural scalability. The authors claim the methodology “can in principle scale to millions or billions of synthetic user worlds given sufficient computational resources.” This would cover a diversity of professions and contexts — from engineering workflows to marketing plans to administrative tasks — which is a prerequisite for training agents that generalize beyond the narrow set of tasks from a developer sandbox.

The authors position the work as a “foundational substrate for agent self-improvement” — agents that can generate their own training data within simulated worlds and correct themselves based on their own mistakes. If the approach proves reproducible, this could be one of the key components of the next generation of productive AI systems: not just better models, but a procedural economy of synthetic experience through which models are trained.

Frequently Asked Questions

What are synthetic computers in this paper?
Realistic simulated computing environments with authentic folder hierarchies, documents, and artifacts. The goal is for AI agents to work within them as real users would — navigating the file system, creating artifacts, and executing productive tasks.
How long do the simulations last?
On average over 2,000 steps and more than 8 hours of agent work per simulation. This is significantly longer than most previous agent benchmarks that target tasks of a few dozen steps.
What are the scaling plans?
The authors claim that the methodology can in principle scale to millions or billions of synthetic user worlds given sufficient computational resources, covering diverse professions and contexts.
🤖

This article was generated using artificial intelligence from primary sources.