arXiv:2605.15338 Sleeper Memory Poisoning: 99.8% attack success rate on GPT-5.5 via persistent memory of LLM agents
Hidden in Memory is a new arXiv paper published on May 14, 2026 by Sidharth Pulipaka, Stanislau Hlebik, Leonidas Raghav, Sahar Abdelnabi, Vyas Raina, Ivaxi Sheth, and Mario Fritz that presents a delayed-execution attack on stateful LLM agents. Adversarial content in external context (documents, webpages) corrupts the agent's persistent memory — 99.8% success on GPT-5.5 and 95% on Kimi-K2.6, with 60–89% success converting poisoned memory into attacker-intended actions.
This article was generated using artificial intelligence from primary sources.
Sidharth Pulipaka, Stanislau Hlebik, Leonidas Raghav, Sahar Abdelnabi, Vyas Raina, Ivaxi Sheth, and Mario Fritz published on arXiv on May 14, 2026 a paper presenting Sleeper Memory Poisoning — a new attack vector that exploits persistent memory of LLM agents for delayed-execution attacks with dramatic success rates: 99.8% on GPT-5.5 and 95% on Kimi-K2.6.
What does sleeper memory poisoning specifically mean?
Classic LLM security threats — prompt injection, jailbreaking, context manipulation — share one fundamental limitation: the attack lasts only as long as adversarial content is in the context. Once the user leaves the session or clears the context, the attack disappears.
Sleeper memory poisoning changes that profile. Current stateful LLM assistants (ChatGPT with Memory, Claude Projects, Gemini Personalization) persist user-specific information across multiple sessions. The paper demonstrates that this persistent memory can be corrupted through fabricated facts that:
- Are written to storage automatically through normal user interaction
- Remain dormant until a retrieval trigger arrives
- Activate in later sessions when the agent uses the memory item for another task
- Manipulate subsequent conversations in the attacker-intended direction
The difference between sleeper memory poisoning and classic prompt injection is dramatic: persistence. The attack can remain dormant for days or weeks before triggering.
What does the attack pipeline specifically look like?
The paper fully evaluates the complete attack pipeline:
- Fabrication writing — adversarial content in an external document, webpage, or repository that the agent processes
- Memory write — the agent processes the content and writes fabricated “facts” to persistent memory as user preferences, facts, or context
- Dormancy period — everything between writing and retrieval
- Memory retrieval — the agent in a later session uses the memory item for another task
- Action triggering — poisoned memory influences agent reasoning and triggers the attacker-intended action
The approach exploits the trust boundary between the user and external sources. The agent treats anything the user feeds it as trustworthy, even if an external document the user uploads contains malicious instructions.
What are the specific success rate figures?
The paper cites precise metrics on two frontier models:
| Model | Memory Poisoning Success | Attacker-Intended Action |
|---|---|---|
| GPT-5.5 | 99.8% | 60–89% of successful retrievals |
| Kimi-K2.6 | 95% | 60–89% of successful retrievals |
The GPT-5.5 figure is particularly dramatic — 99.8% means virtually guaranteed memory corruption if the attacker knows the agent’s structure. Frontier models with state-of-the-art alignment training are almost completely defenseless against this attack vector.
The second metric — 60–89% action triggering rate — shows that successful memory corruption converts into actionable attack in most cases. This is not a theoretical threat — it is a production-grade attack vector with real-world impact.
Why is memory poisoning difficult to detect?
The difficulty of defense stems from several factors:
- Memory writes are normal operation — the agent writes memory items continuously through user interactions
- No anomaly signal — an adversarial memory item looks like any other user fact
- Cross-session evaluation required — single-session monitoring doesn’t detect the attack because the trigger comes later
- Difficult attribution — when the attack triggers, tracing it back to the original adversarial source is a nontrivial retrospective forensics task
The approach requires end-to-end memory pipeline auditing, not a single-point security control.
What does this mean for production LLM deployments?
The findings have critical implications for organizations deploying LLM agents with memory features:
- ChatGPT Enterprise with Memory — potential exposure if employees upload documents from unverified sources
- Claude Projects — compromised projects can corrupt cross-project memory
- Custom agent deployments with vector stores as long-term memory — massive attack surface
- Multi-user systems with shared memory — one compromised user can affect everyone
Defensive priorities implied by the paper:
- Memory source provenance — track every memory item back to the originating source
- Adversarial content scanning before memory writes
- Retrieval anomaly detection — flagging unusual memory access patterns
- Memory expiration policies — automatic cleanup of old memory items
Position in the 2026 agentic security landscape
The paper fits into the explosive wave of agentic safety/security research through May 2026:
- arXiv FATE (May 12) — 33.5% attack reduction through formal techniques
- arXiv History Anchors (May 13) — 91–98% unsafe shift through history manipulation
- arXiv Sycophantic Consensus (May 15) — alignment failure modes
- Microsoft AI Delegation (May 15) — 19–34% reliability degradation
- arXiv Compositional Jailbreaking (May 15) — mutator chain synergies
The trend is crystal clear: 2026 is the year agentic systems transition from “experimental capability” to “production attack surface.” The safety provided by mainstream RLHF + safety training for chatbot use cases is insufficient for stateful agents with persistent memory.
Sleeper Memory Poisoning is likely the most significant security paper of May 2026 due to two numbers: 99.8% and persistence across multiple sessions. The industry must seriously revisit the architecture of LLM memory systems before attackers reproduce these results in real-world deployments.
Frequently Asked Questions
- What does sleeper memory poisoning specifically mean?
- Classic prompt injection attacks last only as long as adversarial content is in the context — sleeper memory poisoning corrupts the agent's persistent memory through fabricated facts stored in long-term memory; the attack remains dormant across multiple sessions and activates when the agent later accesses that memory item for another task, which is dramatically different from prompt injection, which has no persistence.
- What are the specific success rate figures?
- GPT-5.5: 99.8% successful poisoning rate, Kimi-K2.6: 95% success rate; among successfully retrieved poisoned memories, attacker-intended actions were triggered in 60–89% of cases; the attack pipeline was fully evaluated — from fabrication writing into storage, through later retrieval, to manipulation of subsequent conversations.
Related news
arXiv:2605.18414: Prompts do not protect — MCP proxy with ABAC achieves 0% unauthorized tool calls
CNCF: Prempti Brings Policy Enforcement and Visibility to AI Coding Agents
IBM: Project Glasswing brings the most advanced AI-powered security portfolio for enterprise