ArXiv KnowU-Bench: new benchmark for interactive and proactive mobile AI agents
Why it matters
Researchers have introduced KnowU-Bench β a comprehensive benchmark for evaluating a new generation of mobile AI agents, focusing on interactivity, proactivity, and personalization through long-term use.
A gap in the evaluation of mobile agents
Current benchmarks for mobile AI agents mostly measure static capabilities β can the agent execute task A, how is its understanding of the screen, how accurate is it in OCR. But real mobile assistants need to be interactive, proactive, and personalized β and until now this has not been well evaluated.
KnowU-Bench fills that gap as the first comprehensive benchmark that measures capabilities relevant to real-world use.
Three key dimensions
- Interactivity β how naturally the agent communicates with the user, asks the right questions, follows context
- Proactivity β the ability to recognize opportunities to help WITHOUT an explicit request
- Personalization β adaptation to user preferences and habits over time
Why is this important for mobile devices?
Mobile agents face unique challenges compared to desktop:
- Smaller screen β less information, the agent must filter better
- Touch interaction β more complex than mouse/keyboard
- Context switching β the user constantly switches between applications
- Battery and latency β everything must be efficient
- Privacy β the phone knows more about you than the desktop
All the major players are working on mobile agents:
- Apple is working on Apple Intelligence integration
- Google is developing Gemini agents for Android
- Microsoft has Copilot mobile
- Specialized projects such as Imbue Bouncer are building local mobile agents
Connection with PASK
Interestingly, KnowU-Bench was published on the same day as PASK (Proactive Agent System with Knowledge) β it is clear that the research community is coordinated in its focus on proactive mobile agents. KnowU-Bench will likely become a standard tool for evaluating models like PASK.
Implications
For developers of mobile AI products, KnowU-Bench provides:
- Standardized metrics for comparing models
- Realistic test scenarios that reflect real-world use
- A starting point for their own capability assessments
For researchers, it opens new research areas where progress can be clearly quantified.