🔴 🤝 Agents Published: · 3 min read ·

Microsoft Research: Memora — AI Agent Memory With Up to 98% Fewer Tokens and SOTA on Long Conversations

Editorial illustration: Memora — AI agent memory with up to 98% fewer tokens and SOTA on long conversations, without text or faces

Memora is a scalable memory framework from Microsoft Research for AI agents with long horizons. It introduces a harmonic architecture that separates what is stored from how it is retrieved, with cue anchors and a policy-driven retriever. It achieves SOTA on LoCoMo and LongMemEval benchmarks while reducing token consumption by up to 98% compared to full-context approaches.

🤖

This article was generated using artificial intelligence from primary sources.

What Is Memora and What Problem Does It Solve

Agentic memory — a system’s ability to retain and use prior context over the long term — is becoming a key component of production AI solutions. AI agents that conduct long conversations or long-term projects face a fundamental limitation: every time they need old information, they must receive it again or retrieve it externally. Token consumption grows exponentially, and response quality degrades the longer the conversation runs. Microsoft Research presented Memora, a scalable memory framework for long-horizon agents, that addresses this problem at the architectural level. The paper was accepted at ICML 2026 and the source code is publicly available on GitHub.

Harmonic Architecture: Storage and Retrieval as Two Separate Concerns

The central innovation of Memora is the separation of storage from retrieval: what is stored — rich, detailed memory content — is separate from how it is retrieved — through lightweight abstractions and contextual anchors. Every memory entry has two components: the primary abstraction (a phrase of 6 to 8 words) is the only part that enters the vector database for similarity search; the memory value retains the full content, accessible only to the retrieval policy, not to direct search.

Cue anchors function as metadata tags that open alternative pathways to the same memory without predefined ontologies. A sentence about a project agreement isn’t fragmented into multiple separate entries — it is stored once, with multiple anchors, each accessing the same memory from a different context.

Why Classic RAG Isn’t Enough for Long-Horizon Agents?

Classic RAG (Retrieval-Augmented Generation) retrieves documents through simple vector similarity search, without reasoning about what is currently relevant in the conversation context. Memora introduces a policy-driven retriever that treats memory retrieval as active reasoning: it iteratively refines queries, explores related memories through cue anchors, and autonomously determines when to stop searching. This retriever can operate through LLM reasoning or be distilled into a smaller model through reinforcement learning — scaling to production scenarios without depending on expensive LLM calls for every retrieval.

Results: SOTA and 98% Fewer Tokens

Memora achieves state-of-the-art on two reference benchmarks for long conversations. On LoCoMo (600-turn dialogues) it records 86.3% accuracy by LLM judge, and on LongMemEval (115,000 token context) 87.4% accuracy — outperforming all competitors: RAG, Mem0, Nemori, Zep, LangMem, and full-context inference that consumes the entire context without filtering.

Efficiency is the most dramatic result: Memora consumes up to 98% fewer tokens compared to the full-context approach, directly reducing API call costs in production agents. In parallel, it stores half as many memory entries as Mem0 (344 vs 651) with better accuracy, with particularly pronounced gains on multi-hop reasoning tasks — where the agent must combine information from distant parts of a long conversation. Results are consistent across both benchmarks, confirming the scalability of the approach.

Frequently Asked Questions

What is Memora and what is its key innovation?
Memora is a memory framework for AI agents that separates what is stored (rich memory content) from how it is retrieved (lightweight abstractions and cue anchors), reducing token consumption by up to 98% compared to a full-context approach.
On which benchmarks did Memora achieve SOTA results?
On the LoCoMo benchmark (600-turn dialogues) it achieved 86.3% accuracy by LLM judge, and on the LongMemEval benchmark (115,000 token context) 87.4% accuracy — outperforming RAG, Mem0, LangMem, and other competitors.