ArXiv ACIArena: The First Benchmark for Prompt Injection Attacks Across AI Agent Chains
Why it matters
A team led by An has published 1,356 test cases covering 6 multi-agent implementations, measuring robustness against 'cascading injection' attacks — where a malicious prompt is propagated through inter-agent communication channels.
A new type of prompt injection attack
Multi-agent systems (LangGraph, AutoGen, CrewAI, OpenAI Swarm) are growing in popularity for tasks that require coordinating multiple AI agents. But every agent communicating with another agent represents a new attack surface — and according to a new paper published on April 10, this surface is dangerously underexplored.
The team led by An introduces ACIArena — the first systematic benchmark for agent cascading injection (ACI). This is a family of attacks in which:
- An attacker injects a malicious prompt into one component of the system (e.g., a document the first agent reads)
- The first agent processes the input and forwards the “processed” result to the next agent
- The malicious content “presents” itself as legitimate intra-system communication
- Subsequent agents treat the compromised data as trustworthy
- The chain continues until someone executes a dangerous action
What the benchmark contains
ACIArena covers 1,356 test cases across 6 multi-agent implementations. The test cases cover:
- Different input vectors (documents, web pages, API responses)
- Different agent topologies (sequential, parallel, hierarchical)
- Different types of final actions (reading files, writing code, sending emails, executing shell commands)
Why this matters
Most current security studies focus on single-agent scenarios — where the user talks directly to one model. But real-world production deployments increasingly rely on agent chains in which one agent trusts the results of another. ACIArena formally measures how weak this “trust between agents” is.
For development teams already using LangGraph and AutoGen, this benchmark should become a mandatory part of security evaluation prior to production deployment. The absence of such a benchmark so far has meant that attacks were only discovered after incidents.
Related news
ArXiv: Algorithmic monoculture — LLMs cannot diverge when they should
ArXiv OpenKedge: Cryptographic protocol requiring permission before every AI agent action
UK AISI: Claude Mythos Preview achieves 73% on expert cyber tasks — first model to complete a full network attack