Microsoft unveils Memora to tackle AI agents’ memory problem

05/07 18:15 - Microsoft unveils Memora to tackle AI agents’ memory problem
With AI agents increasingly expected to remember conversations, preferences, and decisions over extended periods, Microsoft Research has developed Memora, a memory system designed to provide more scalable and reliable long-term recall than existing approaches. AI agents are increasingly expected to retain context across weeks or months rather than individual chat sessions. Memory can become fragmented, leading to duplicate information and slower retrieval as knowledge grows. According to Microsoft, Memora can solve this problem by decoupling what the AI remembers from how it looks up that information, ultimately reducing context token usage by up to 98% while matching or exceeding full-context accuracy, Microsoft Research claimed in a blog post. Limitations of today’s memory architectures As AI assistants and autonomous agents move into long-horizon deployments, the absence of a principled memory system has become a critical bottleneck. While modern LLMs are powerful reasoners, they still start every session from scratch. Long conversations require models to repeatedly re-read their entire history, while new information is either stored as raw text or compressed into summaries where important details may be lost. Solutions to address these are available, but they too have limitations. For instance, systems like Mem0 extract atomic facts from conversations, retrieval-augmented (RAG) approaches index raw text fragments for later recall, and graph-based memory systems such as Zep and GraphRAG impose structure through entity relations. But these mostly fall into two extremes. Content-fragmentation systems, such as RAG and Mem0, embed extracted facts or text fragments directly. This preserves detail but produces brittle, isolated entries that lose narrative coherence. Coarse-abstraction systems compress experience into compact summaries but strip away the constraints, edge cases, and numeric details that make memory useful in the first place. Graph-based systems add structure on top of content but still rely on the content itself for retrieval and typically require rigid ontologies that don’t generalize across domains. Decoupling memory from retrieval Memora architecture claims to address this by decoupling what is stored from how it is retrieved. For this, each memory entry will have two components. The first will be a primary abstraction, which is a short phrase (6–8 words) that will capture what the memory is fundamentally about. The second will be a memory value, which will hold the rich content itself. As a result of this separation, new information about an evolving topic will be merged into the existing memory entry under the same primary abstraction and will not be fragmented into a chain of partial duplicates. Complementing primary abstractions, cue anchors are short, context-aware tags extracted from each memory’s value, providing alternative access paths to the same memory. They will function as flexible, organically-generated metadata, claimed the post. Memora also introduces a policy-guided retriever that, rather than returning the top-k semantically similar items in a single shot, iteratively refines its query, expands through cue anchors to surface related-but-not-similar memories, and decides when to stop. “The deepest flaw in current agent memory is that it mistakes retrieval for memory. A vector store is superb at finding text that looks relevant. An enterprise agent needs more than resemblance. It needs to know what has changed, what still holds true, and what should never be recalled in the task at hand,” said Sanchit Vir Gogia, chief analyst at Greyhound Research. Memora is interesting precisely because it refuses that shortcut, Gogia noted. It separates the rich detail of a memory from the handle used to find it, indexing a stable abstraction and a set of cue anchors while keeping the full content intact beneath them. Retrieval then becomes an act of navigation rather than a single hopeful guess, as the system re-queries, widens its search, or stops once it has enough, he added. Benchmarking Memora Microsoft evaluated Memora on two long-context benchmarks. LoCoMo, where dialogues average 600 turns, and LongMemEval, which uses 115,000-token contexts. According to the company, Memora achieved 86.3% LLM-judge accuracy on LoCoMo and 87.4% on LongMemEval, outperforming RAG, Mem0, Nemori, Zep, LangMem, and even full-context inference. It also stored nearly half as many memory entries per conversation as Mem0 (344 versus 651) while reducing token consumption by up to 98% compared with full-context inference. While the benchmark results suggest significant efficiency gains, enterprises should not assume lower token consumption will automatically translate into lower infrastructure costs. Gogia cautioned against taking the token reduction number at face value. It is a benchmark context reduction, not a promise that an enterprise bill will fall by 98%, he said. “Real cost also includes memory construction, indexing, storage, and the audit logging that governance demands.” He warned that Memora’s strongest retrieval mode is also its slowest. Its policy retriever runs at between roughly five and six seconds per query across several model-calling steps, against under a second for the simpler semantic mode. The saving in prompt tokens is partly repaid as retrieval latency and extra inference. So the memory crunch does not disappear but moves. Instead of paying only for longer prompts, enterprises must now manage what is written, updated, and forgotten, and the indexing and testing that govern it. Enterprise implications Memora is currently an active Microsoft Research project, but the company has made the research code available on GitHub, enabling developers to experiment with the architecture and adapt it for their own AI applications. However, portability on paper should not be confused with production readiness. While a memory layer of this design can, in principle, sit above models from any major provider, Gogia suggests that until the code is fully verifiable, maintained, and supportable under enterprise controls, the prudent posture for IT leaders is to study Memora as an architecture rather than operationalize it as software. Beyond the technology, organizations will need governance and compliance policies to ensure AI memories are managed securely and remain auditable. He noted an enterprise must decide who may write to memory, who may read it, how long it persists, and how an auditor reconstructs why a memory shaped an action. “An enterprise must decide who may write to memory, who may read it, how long it persists, and how an auditor reconstructs why a memory shaped an action. ‘The agent remembered it’ will not satisfy a regulator under the European Union’s AI Act traceability duties, nor a customer under India’s Digital Personal Data Protection Act,” Gogia said. The article originally appeared on InfoWorld. ...

Lees artikel verder op Computerworld

Meer over Computerworld