ArXiv OpenMobile: open-source mobile agents with trajectory synthesis and policy-switching
Why it matters
OpenMobile is a new open-source framework for developing mobile agents based on vision-language models. After fine-tuning Qwen2.5-VL, it achieves 51.7% success, and Qwen3-VL reaches 64.7% on the AndroidWorld benchmark — significantly above existing open-data approaches and close to closed systems that reach nearly 70%. The authors release all data and code publicly.
A team of 14 researchers led by Kanzhi Cheng published the paper “OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis” on April 16, 2026. The work focuses on building a fully open mobile agent system capable of competing with the closed solutions of major technology companies.
What is new in the approach?
Mobile agents based on vision-language models are an active research area, but the leading systems — Apple’s, Google’s, and OpenAI’s prototypes — keep their training data private. This poses a serious problem for reproducibility and innovation outside large labs.
OpenMobile proposes two key technical innovations:
Task Synthesis Pipeline — the system explores applications and builds an environment memory that it then uses to generate diverse, concretely grounded task instructions. Instead of researchers manually writing tasks, the agent synthesizes them through real interaction with the ecosystem.
Policy-Switching Strategy — during trajectory rollout, the system alternates between the student model and the expert model. This captures error-recovery scenarios that are missing from standard imitation learning — the expert rarely makes mistakes, so the learner never sees what recovery from a wrong action looks like.
Results on AndroidWorld
The paper demonstrates competitive results on the AndroidWorld benchmark, the standard test for mobile agents:
- Fine-tuned Qwen2.5-VL: 51.7% success rate
- Fine-tuned Qwen3-VL: 64.7% success rate
These numbers are significantly above existing open-data approaches and approach closed systems that achieve around 70% success. The gap of a few percentage points between open and closed systems is dramatically smaller than it was a year ago, showing that quality data synthesis can catch up with proprietary advantage.
Transparency and open access
The authors conducted contamination analyses that check for overlap between synthetic task instructions and test sets. They conclude that performance comes from broad functional coverage, not from overfitting to the benchmark. This is an important detail because many mobile agent publications inflate numbers precisely through silent overlap with the evaluation set.
Data and code are publicly released, enabling other researchers to reproduce and build upon the results. For the industry, this is a useful reference point — it shows how much open agents can achieve with available Qwen models, and illuminates how data synthesis strategies can close the gap toward proprietary systems.
This article was generated using artificial intelligence from primary sources.
Related news
Anthropic: Memory for Managed Agents in public beta — AI agents that remember context between sessions
GitHub: Cloud agent sessions now available directly from issues and project views
ArXiv SWE-chat — a dataset of real developer interactions with AI coding agents in production