ArXiv KnowU-Bench: new benchmark for interactive and proactive mobile AI agents
Researchers have introduced KnowU-Bench — a comprehensive benchmark for evaluating a new generation of mobile AI agents, focusing on interactivity, proactivity, and personalization through long-term use.
This article was generated using artificial intelligence from primary sources.
A gap in the evaluation of mobile agents
Current benchmarks for mobile AI agents mostly measure static capabilities — can the agent execute task A, how is its understanding of the screen, how accurate is it in OCR. But real mobile assistants need to be interactive, proactive, and personalized — and until now this has not been well evaluated.
KnowU-Bench fills that gap as the first comprehensive benchmark that measures capabilities relevant to real-world use.
Three key dimensions
- Interactivity — how naturally the agent communicates with the user, asks the right questions, follows context
- Proactivity — the ability to recognize opportunities to help WITHOUT an explicit request
- Personalization — adaptation to user preferences and habits over time
Why is this important for mobile devices?
Mobile agents face unique challenges compared to desktop:
- Smaller screen — less information, the agent must filter better
- Touch interaction — more complex than mouse/keyboard
- Context switching — the user constantly switches between applications
- Battery and latency — everything must be efficient
- Privacy — the phone knows more about you than the desktop
All the major players are working on mobile agents:
- Apple is working on Apple Intelligence integration
- Google is developing Gemini agents for Android
- Microsoft has Copilot mobile
- Specialized projects such as Imbue Bouncer are building local mobile agents
Connection with PASK
Interestingly, KnowU-Bench was published on the same day as PASK (Proactive Agent System with Knowledge) — it is clear that the research community is coordinated in its focus on proactive mobile agents. KnowU-Bench will likely become a standard tool for evaluating models like PASK.
Implications
For developers of mobile AI products, KnowU-Bench provides:
- Standardized metrics for comparing models
- Realistic test scenarios that reflect real-world use
- A starting point for their own capability assessments
For researchers, it opens new research areas where progress can be clearly quantified.
Sources
Related news
arXiv:2605.22502: Compiling agentic workflows into LLM weights achieves near-frontier quality at 100× lower cost
arXiv:2605.22794: MOSS shows agents that self-improve by rewriting their own source code
arXiv:2605.22535: TerminalWorld benchmark measures LLM agents on real Linux terminal tasks without simulation