🟢 🛡️ Security Published: · 3 min read ·

arXiv:2606.04329: Memory poisoning of AI agents — 9 vulnerabilities and MPBench

arXiv:2606.04329 ↗

Editorial illustration: Memory poisoning of AI agents — 9 vulnerabilities and MPBench

A systematic study of poisoning the persistent memory of AI agents identifies four channels for writing to memory, nine structural vulnerabilities and a taxonomy of six attack classes, and introduces the MPBench benchmark. The key finding: agents designed to write and retrieve memory more aggressively are easier to exploit, and existing defenses against prompt injection do not cover memory poisoning.

🤖

This article was generated using artificial intelligence from primary sources.

What does the paper on agent memory poisoning investigate?

Memory Poisoning Attacks on LLM Agents is a security study published on 3 June 2026 at 01:04 UTC on arXiv under the identifier arXiv:2606.04329 (version v1) that systematically analyzes the poisoning of the persistent memory of AI agents. Memory poisoning is an attack in which malicious content is injected into an agent’s persistent memory, which the agent later retrieves and uses when making decisions. The paper is the first comprehensive taxonomy of this problem and offers a framework for measuring and defending against it.

What are the channels for writing to memory?

The study identifies four channels through which an attacker can write content into an agent’s memory. These are the paths by which information reaches persistent storage, for example through conversation with the user, through external documents, or through the results of tools the agent uses. Understanding these channels is crucial because each represents a separate entry point that the defense must cover. If even one channel is unprotected, an attacker can permanently distort the agent’s behavior.

How many vulnerabilities and attack classes does the paper describe?

The paper enumerates nine structural vulnerabilities in the way agents store and retrieve memory, and organizes them into a taxonomy of six attack classes. The structural vulnerabilities relate to weaknesses in the memory system’s architecture itself, independent of any individual model. The taxonomy of six attack classes gives researchers and builders a common vocabulary for describing and comparing threats, which facilitates the development of targeted defenses.

What is MPBench and what is it for?

To measure agent resilience, the study introduces a benchmark named MPBench. It enables standardized testing of attacks and defenses against the identified write channels and vulnerabilities. Without a common measure, it is difficult to compare how resilient individual agents or defense mechanisms are to memory poisoning. MPBench thereby becomes a reference point for future research, much like prompt-injection benchmarks serve to measure resilience to attacks within a single query.

What is the study’s key finding?

The most important result is that agents designed to write and retrieve memory more aggressively are more exploitable. In other words, the more an agent bases its behavior on persistent memory, the more vulnerable it is to its poisoning. This finding creates a direct tension between usefulness, since rich memory makes an agent more capable, and security, because that same memory becomes an attack surface. Builders must carefully balance how much memory is used and how it is protected.

Why are existing defenses insufficient?

The study warns that existing defenses against prompt injection do not cover memory poisoning. Prompt injection acts within a single query and its influence disappears when the conversation ends, whereas memory poisoning affects persistent memory that lasts between sessions. The harmful entry therefore remains active over the long term, even after the original attack is over. The finding means that security teams must develop separate mechanisms for protecting memory, rather than relying on tools designed for attacks within a single query.

Frequently Asked Questions

What is memory poisoning of AI agents?
Memory poisoning is an attack in which malicious or incorrect content is injected into an AI agent's persistent memory. Because the agent later retrieves and uses that memory when making decisions, a poisoned entry can distort its future behavior even after the original attack has ended.
How does memory poisoning differ from prompt injection?
Prompt injection acts within a single query and its influence disappears when the conversation ends. Memory poisoning affects persistent memory that lasts between sessions, so the harmful entry remains active over the long term. The paper shows that existing defenses against prompt injection do not cover this channel.
What is MPBench?
MPBench is a benchmark introduced by this study for measuring the resilience of AI agents to memory poisoning. It enables standardized testing of various attacks and defenses against the four write channels and nine structural vulnerabilities that the paper identifies.