ArXiv KnowU-Bench: new benchmark for interactive and proactive mobile AI agents

A gap in the evaluation of mobile agents

Current benchmarks for mobile AI agents mostly measure static capabilities — can the agent execute task A, how is its understanding of the screen, how accurate is it in OCR. But real mobile assistants need to be interactive, proactive, and personalized — and until now this has not been well evaluated.

KnowU-Bench fills that gap as the first comprehensive benchmark that measures capabilities relevant to real-world use.

Three key dimensions

Interactivity — how naturally the agent communicates with the user, asks the right questions, follows context
Proactivity — the ability to recognize opportunities to help WITHOUT an explicit request
Personalization — adaptation to user preferences and habits over time

Why is this important for mobile devices?

Mobile agents face unique challenges compared to desktop:

Smaller screen — less information, the agent must filter better
Touch interaction — more complex than mouse/keyboard
Context switching — the user constantly switches between applications
Battery and latency — everything must be efficient
Privacy — the phone knows more about you than the desktop

All the major players are working on mobile agents:

Apple is working on Apple Intelligence integration
Google is developing Gemini agents for Android
Microsoft has Copilot mobile
Specialized projects such as Imbue Bouncer are building local mobile agents

Connection with PASK

Interestingly, KnowU-Bench was published on the same day as PASK (Proactive Agent System with Knowledge) — it is clear that the research community is coordinated in its focus on proactive mobile agents. KnowU-Bench will likely become a standard tool for evaluating models like PASK.

Implications

For developers of mobile AI products, KnowU-Bench provides:

Standardized metrics for comparing models
Realistic test scenarios that reflect real-world use
A starting point for their own capability assessments

For researchers, it opens new research areas where progress can be clearly quantified.