arXiv:2606.09900: Engram — a Bi-Temporal Memory Engine, +10.4 Points With 8× Fewer Tokens
Engram is an open-source memory system that shows that smartly retrieved 'lean' context outperforms loading the entire conversation history. On the LongMemEval_S benchmark it achieved 83.6% versus 73.2% for full-context, using about 8× fewer tokens.
This article was generated using artificial intelligence from primary sources.
arXiv:2606.09900, published on June 5, 2026, at 11:43 UTC, introduces Engram — an open-source memory system that shows that smartly retrieved “lean” context (a condensed, targeted set of information) outperforms loading the entire conversation history. The results suggest that more context does not necessarily mean better answers, but that the quality of retrieval is decisive.
What is Engram and what problem does it solve?
Engram addresses the question of how to provide an AI agent with the right knowledge at the right moment, without unnecessary load. The usual approach is to load the entire conversation history as full-context, but this consumes a lot of tokens and can introduce noise.
In contrast, Engram retrieves only the relevant parts. It thus shows that carefully selected, condensed context can yield better results than an approach in which everything is handed to the model at once. This is a shift from quantity to relevance.
How does the dual-process architecture work?
Engram uses a dual-process architecture built on a bi-temporal data model. The first process is a fast write path that adds episodes without an LLM call, making the writing of new information cheap and fast.
The second process is an asynchronous path that builds a bi-temporal knowledge graph in the background. It extracts atomic facts and resolves contradictions among them. This division allows the system to simultaneously record new data quickly and gradually build an orderly, consistent model of knowledge.
What does a bi-temporal data model mean?
The bi-temporal model tracks two time dimensions for each piece of information: when the event occurred and when it was recorded. This distinction allows the system to correctly interpret the temporal sequence of events and to recognize when some later piece of information conflicts with an earlier one.
Precisely thanks to this model, Engram can resolve contradictions while building its knowledge graph. Instead of piling up conflicting claims, the system maintains a coherent picture of knowledge that respects time.
What are the results on the benchmark?
On the LongMemEval_S benchmark, Engram achieved 83.6%, versus 73.2% for the full-context approach. That is an improvement of 10.4 points, statistically very convincing (McNemar p < 10⁻⁶).
Most impressive is the ratio of performance to cost. Engram used only about 9.6k retrieved tokens instead of 79k, which is roughly 8× fewer tokens. At the same time, it recorded not a single error across all 500 questions. This confirms the paper’s main thesis: smart, condensed retrieval can be both more accurate and significantly cheaper than loading the entire history.
Why is this approach important for AI agents?
For autonomous AI agents that carry on long conversations or perform tasks across many steps, memory management becomes a key bottleneck. A model’s context window is limited, and filling it with large amounts of past information increases both cost and the risk of errors.
Engram offers a practical answer to this problem. Since it is open source, development teams can embed it into their own agents without depending on closed solutions. The combination of a fast write path without LLM calls and background construction of a knowledge graph means the system can grow along with the conversation history without slowing down the interaction. The results on LongMemEval_S suggest this approach could become a standard in building memory layers for agents.
Frequently Asked Questions
- What is Engram?
- Engram is an open-source memory system for AI agents that shows that smartly retrieved, condensed ('lean') context outperforms loading the entire conversation history. It uses a dual-process architecture based on a bi-temporal data model. The goal is to deliver relevant information with significantly lower token consumption.
- What does a bi-temporal data model mean?
- A bi-temporal model tracks two time dimensions of data — when something happened and when it was recorded. This allows the system to build knowledge that respects temporal ordering and to resolve contradictions among facts. On that basis, Engram builds a knowledge graph of atomic facts.
- How successful was Engram in tests?
- On the LongMemEval_S benchmark, Engram achieved 83.6% versus 73.2% for the full-context approach, an improvement of 10.4 points (McNemar p < 10⁻⁶). It used about 9.6k retrieved tokens instead of 79k, roughly 8× fewer, without a single error across all 500 questions.
Related news
GitHub: Internal Analytics Agent Qubot Reduced Query Resolution Time by Around 66 Percent
NVIDIA: Partners at Cannes Lions 2026 Showcased Agentic AI for Marketing, Criteo Doubled Training Speed on Blackwell
Anthropic: Project Fetch Phase Two Shows 20× Faster Robotic Operation with 10× Less Code