A language model is brilliant, and it forgets everything the moment you close the tab.
Its context window is working memory — a fixed buffer where thinking happens. When the conversation ends, it’s gone. In human terms that’s severe anterograde amnesia: intact intelligence, no ability to form a lasting memory. Solving it means borrowing from the one system that already did — the brain. This is a guide to how memory actually works, how AI is trying to copy it, and where the ground is still open.
An engram is the physical trace a memory leaves in the brain — a term coined by Richard Semon in 1904Semon 1904, when it was pure theory. A century later, Susumu Tonegawa’s lab found the cells: they tagged the neurons active during an experience, then reactivated them with light and watched the memory returnTonegawa 2012–2015. Engrams are real, locatable, and switchable. This guide is organized around the mechanisms those traces are built from — one specimen at a time.
The spine
Eleven mechanisms
Each specimen reads the same four ways: how the brain does it, the model that formalizes it, how AI implements or fails it, and the mapof who has claimed it. Where the map runs thin, there’s open ground.
Not everything that happens is worth keeping. Something has to decide.
In the brain
The brain can’t keep everything, and shouldn’t. Attention and arousal gate what the hippocampus bothers to consolidate, and a dopamine signal from the midbrain tags an event as worth keeping when it’s novel, surprising, or tied to a goal. The amygdala adds a second vote for anything with emotional stakesMcGaugh 2000. Most of a day never crosses that threshold — and that’s the point.
The model
Formalized, it’s a weighting problem: each experience earns an encoding strength from several factors at once — novelty, relevance to current goals, emotional charge, likely future use. Higher-salience items start with more strength and, in richer models, decay more slowly. Salience isn’t one number; it’s a small profile.
In AI
Nearly every AI memory system scores candidate memories at write time — an importance heuristic, or an LLM simply asked “how important is this?” engram scores four dimensions (novelty, relevance, emotional, predictive) as it encodes; Stanford’s Generative Agents scored importance on write; DeWix’s granted patent weights parts of speech including an emotional axis. The hard part was never scoring — it’s scoring well, and adapting the weights to one particular person.
Salience scoring is thoroughly occupied. The live edge is learned salience — weights that adapt to what a specific user actually reinforces, rather than hand-tuned constants.
Memories aren't saved when they're made. They're rebuilt, offline, while you sleep.
⚖ Contested — How consolidation works long-term is actively debated — standard systems consolidation vs. multiple-trace theory.
In the brain
A memory isn’t fixed the moment it forms. Over hours to years it’s reorganized: the hippocampus captures an episode fast, then replays it — especially during slow-wave sleep, carried on sharp-wave ripples — to the neocortex, which slowly integrates it into long-term knowledgeDiekelmann 2010.
Whether the hippocampus ever fully hands off, or is always needed to replay a detailed memory, is still debated⚖Nadel 1997.
The model
The Complementary Learning Systems account: a fast hippocampal learner and a slow neocortical one. Interleaving replay of old and new lets the slow system extract structure without catastrophically overwriting what it already knewMcClelland 1995 — the reason sleep matters for memory at all.
In AI
In practice: an offline “sleep” pass that merges redundant memories and extracts patterns. engram runs a consolidation cycle between sessions (merge duplicates, distill old episodics to gist); Microsoft filings cluster a vector store with DBSCAN/HDBSCAN and select by age and effectiveness. engram’s two-pass design — a cheap model finds merge candidates, a stronger one rewrites them — mirrors the fast/slow split directly.
Consolidation is heavily patented for the single-session case. The open ground is long-horizon consolidation: what should distill over months and years, not one night.
Forgetting is not the failure of memory. It is one of its functions.
⚖ Contested — Whether forgetting is true decay or retrieval interference is a century-old, still-unsettled question.
19% retained after 4 days
In the brain
Memories weaken with time unless they’re renewed. Ebbinghaus measured it on himself in 1885 — a steep initial drop that flattens into a long tailEbbinghaus 1885. Forgetting isn’t simply failure: it clears capacity and keeps one-off events from overwriting durable knowledge.
Whether the trace physically fades or is just crowded out by competitors is a century-old open question⚖.
The model
The forgetting curve. Retention falls off as an exponential or, better-fitting, a power-law function of elapsed timeWickelgren 1974, modulated by how strongly it was encoded and flattened each time it’s retrieved — the mathematics behind spaced repetition.
In AI
The most-copied idea in AI memory. MemoryBank applies Ebbinghaus directly; a granted EMC/Dell patent decays relevance by linear, exponential, or step functions; DeWix claims R = e^(−a·x). engram computes strength on read from salience, access history, and elapsed time rather than storing a stale number. Most systems still decay linearly, which doesn’t match the brain’s power-law shape.
Decay curves are anticipated many times over. What almost no one models: the difference between true forgetting (the trace is gone) and retrieval failure(it’s intact, but the path to it is lost).
New memories crowd out old ones. Old ones distort new ones. They compete.
In the brain
Memories compete. Proactive interference: older memories impair learning and recall of newer, similar ones. Retroactive: new learning degrades oldUnderwood 1957. The brain actively inhibits competing traces during encoding and retrieval — it doesn’t just file everything side by side and hope.
The model
Interference theory frames forgetting as competition among similar traces at retrieval — scaled by how similar they are and how many compete — rather than as passive decay. The more crowded the neighborhood, the harder any one memory is to find.
In AI
Suddenly urgent for agents. The Unable to Forgetbenchmark showed an LLM’s retrieval accuracy collapsing toward zero as conflicting values accumulate — proactive interference, measured. FadeMem applies contradiction-triggered graded attenuation (an LLM classifies the conflict); engram weakens an old memory’s salience the moment a new one supersedes it; Beijing Lingxin resolves conflicts with a BERT classifier.
The map
Proactive interference — eroded to a narrow claim (FadeMem §2.3, ~5 weeks pre-priority).
Interference is well diagnosed and barely solved. There are benchmarks and a few attenuation tricks, but reliably suppressing outdated information — without losing what’s still true — is open.
Every time you remember something, you change it — and usually make it stronger.
In the brain
Retrieval is not a passive read. Recalling something strengthens it more than re-studying it would — the testing effectRoediger 2006. Recall is cue-driven and reconstructive: a partial cue reactivates the whole pattern, and neurons that fire together wire together, so use is itself a form of encoding.
The model
Retrieval-induced strengthening: each successful recall adds to a memory’s strength and resets its decay clock. Storage strength (how well-learned) and retrieval strength (how accessible right now) are distinct — you can know something deeply yet fail to find it, and finding it makes it easier to find again.
In AI
Retrieval is the one part every RAG system does. The subtler move — letting a read changethe memory — is rarer. engram boosts strength when a memory is recalled; Generative Agents fold recency, importance, and relevance into a score computed at read time. A real gap: if the agent leans on a session briefing instead of querying, no strengthening fires — “use it or lose it” only triggers on explicit lookups.
Retrieval-time scoring is §103-occupied. Passive strengthening — a memory gaining weight because it was used in conversation, not explicitly queried — is thin.
Recall makes a memory briefly editable. Then it's re-stored, sometimes rewritten.
⚖ Contested — The boundary conditions for reconsolidation — when a recalled trace becomes labile — are still debated.
In the brain
Retrieving a settled memory can return it to a fragile, editable state — it has to be re-stored, and can be altered before it re-stabilizesNader 2000. This is why memory is reconstructive, not reproductive: every recall is a chance to rewrite.
Exactly which memories become labile, and when, is still being worked out⚖.
The model
A recalled trace briefly becomes writable; an updated version overwrites the old while keeping the same identity. It’s how “Mike has one son” becomes “Mike has two sons” without deleting and recreating the memory — the fact evolves in place.
In AI
Mem0’s UPDATE operation — edit a memory’s content in place, keep its id, log the prior version — is exactly this, and it’s the closest §102 prior art there is. engram’s reinforce-with-new-content does the same: it rewrites the text while strengthening the memory, logging the old wording first as a safety net.
The map
In-place update — §102 anticipated (Mem0 UPDATE + history log).
In-place update is anticipated. What’s unexplored is reconsolidation as deliberate distortion— letting recall reshape a memory toward the agent’s current understanding, the way human memory quietly does.
"I met her on Tuesday" slowly becomes "I know her" — the event fades, the meaning stays.
⚖ Contested — Whether gist is compressed episodic detail or an independent multi-trace abstraction is unresolved.
In the brain
Two systems, first distinguished by Tulving: episodicmemory holds specific events (“I met her Tuesday”); semanticmemory holds general knowledge (“I know her”)Tulving 1972. Over time, repeated episodes shed their verbatim detail and leave gist — Fuzzy-Trace Theory holds the two as parallel traces, the verbatim one fading fasterBrainerd 2002.
Whether semantic memory is just compressed episodes or a separate abstraction is unsettled⚖.
The model
Verbatim-to-gist degradation: detailed traces decay quickly while the extracted meaning persists. You keep “we argued about consciousness” long after the exact words are gone. Consolidation is the process that does the distilling.
In AI
engram tags each memory episodic or semantic and, during consolidation, compresses aging episodics into semantic gist with a small model; merged and generalized memories are born semantic. Microsoft’s layered working/episodic/collective memory and Alai Vault’s episodic/semantic tiers cover the split. One honest caveat: real semantic memory is gist across many episodes, not one old episode squeezed down.
The moments that mattered are the ones you keep. Feeling is the encoder's thumb on the scale.
In the brain
Emotion is the encoder’s thumb on the scale. Arousing events are remembered better and longer: the amygdala modulates hippocampal consolidation, and stress hormones flag an event as important as it’s laid downMcGaugh 2000. It’s why a flashbulb memory survives while a thousand ordinary Tuesdays vanish — and emotional memories don’t just start stronger, they decay more slowly.
The model
Two effects, not one: an affective weight on initial encoding strength, and a reduced decay rate. Emotion changes both the height and the slope of the forgetting curve — a fact most implementations quietly drop.
In AI
Usually collapsed into a single salience axis. DeWix weights adjectives “reflecting the emotional context”; Dynamic Affective Memory Management keeps an affective dimension in a numeric salience profile; engram scores an emotional dimension at encoding. But almost nothing lets emotion change the decay ratethe way the amygdala does — it’s treated as starting weight only.
Remembering to do something later — the memory that fires itself at the right moment.
In the brain
Remembering to do something later— take the medicine at eight, pass on the message. Unlike memory for the past, a prospective memory has to fire itself at the right moment, cued by a time or an event, while you’re busy with something elseEinstein 1990. It leans on prefrontal monitoring and a spontaneous retrieval when the cue finally appears.
The model
An intention stored together with a trigger — time-based (“at 8pm”) or event-based (“when I see Katie”) — that has to be noticed and executed without an explicit prompt to remember it. The hard part is the noticing.
In AI
Thin. Assistant “reminders” (Meta’s proactive-reminder patents) and cognitive architectures like ACT-R cover narrow, scheduled forms. But a general agent that spontaneously acts on a stored future intention when a matching context appears— not when it’s asked — is barely built. engram lists it as a TODO, not a feature.
The map
Prospective memory — anticipated in narrow forms (ACT-R/Soar; Meta US11567788B1).
Recalling one thing pulls its neighbors with it — everything you learned in the same hour.
In the brain
Memories formed close together in time get linked. Recalling one tends to drag its neighbors along — the contiguity effectHoward 2002. A slowly drifting internal “context” is bound to each memory as it forms; reinstating that context reactivates everything you encoded in the same stretch. It’s why a smell can return an entire afternoon.
The model
The Temporal Context Model: a drifting context vector is associated with each item, and retrieval reinstates that context, which in turn cues temporally-nearby itemsHoward 2002. Association falls off smoothly with temporal distance.
In AI
engram returns “temporal siblings” — memories formed in the same session — alongside direct recall matches, and boosts the current project’s memories at session start (context-dependent retrieval by working directory). Zep/Graphiti build an explicit temporal knowledge graph; PAM retrieves through temporal co-occurrence.
The map
Temporal association — §103 obvious (Temporal Context Model; PAM).
PAM~2026PAM — retrieval through temporal co-occurrencepaperarXiv ↗
§103-occupied through the Temporal Context Model and graph systems. Rich context reinstatement — reconstructing a whole prior context, not just same-session tagging — is under-built.
New facts don't land on blank ground. They're folded into what you already believe.
In the brain
New information never lands on blank ground. It’s assimilated into existing schemas— mental frameworks that shape what gets encoded and how it’s later reconstructed. Bartlett’s readers, retelling an unfamiliar folk tale, unconsciously reshaped it toward their own cultural expectationsBartlett 1932. Schema-consistent facts consolidate faster; there’s already a slot for them.
The model
Assimilation versus accommodation: fit new data into an existing frame, or revise the frame to fit the data. Schemas speed the encoding of consistent information and bias the reconstruction of anything ambiguous — memory as an active model, not a passive log.
In AI
The least-built mechanism here. Knowledge-graph memories (Microsoft’s brain-modeling KG; Cerego’s related-concept generation) gesture at structure, and entity tracking would be a first step — but AI memory systems overwhelmingly store isolated memory atoms, not memories assimilated into an evolving model of the world. engram’s own review flags this as its biggest missing piece.
The map
Schema assimilation — largely open on the AI side; classic in cognitive science.
Open ground. Assimilating memories into an evolving schema — rather than accumulating independent atoms — is close to untouched on the AI side.
The map
The landscape
Every patent, paper, and open-source system found across five rounds of adversarial prior-art search. Sorted by priority date, the shape is the finding: the field is occupied, not crowded — a convergence in 2023–2026, with Microsoft, the ex-Cerego team behind Alai Vault, DeWix, Snap, Meta, Samsung, and a fast-moving research frontier all arriving at once.
Each dot is a prior-art reference. The cluster is the point — the field is occupied, not crowded.
Alai Vault LLC (Smith Lewis, Harlow)Biologically-inspired memory: power decay of salience, episodic/semantic tiers, agent memory toolUS20240289863A1
patent
2023Applications pending
Strongest single §103 obviousness reference. Specification teaches parameterized age-dependent decay of salience, power decay of episodic memories, episodic/semantic tiers, vector-DB long-term memory, a conversational memory tool — though the 20 claims cover brand-marketing agent personalization only. Inventors are ex-Cerego. Family: WO2024178435A1, CA3284082A1.
Encoding-time memory-conflict resolution: a BERT model detects conflict and resolves by elimination or integration — overlaps part of the interference mechanism (a §103 reference, not anticipation).
The most directly relevant reference — claims (not just spec) recite the machinery: exponential decay R=e^(-a·x), top-20% summarization, timestamped storage, delete-low-retention (claim 1); importance from parts of speech with adjectives weighted for emotional context (claim 2); ML-derived decay rate (claim 4); semantic-similarity clustering and pruning (claim 5). Claim analysis via machine translation.
Taranjeet, et al.Mem0 — scalable long-term memory for AI agentsmem0
oss
2025arXiv:2504.19413 (OSS)
ADD/UPDATE/DELETE/NOOP operations with a bounded top-s dedup context. The UPDATE-with-history operation is exactly the in-place-update ('reconsolidation') mechanism — §102 anticipation for it.
The closest research reference. Does contradiction-triggered, encoding-time, graded arithmetic attenuation of memory strength in an LLM agent — most of what a proactive-interference mechanism does. But conflict is LLM-classified (GPT-4o-mini), it attenuates a scalar not a multi-dimensional vector, strength is a stored field batch-rewritten (not computed on read), and it uses one model tier. Predates engram's priority by ~5 weeks.
If the obvious mechanisms are occupied, where is the real work? Read as a map, the search points fairly clearly.
Evaluation
Almost everyone is building memory systems; almost no one agrees how to measure them. Most results lean on synthetic benchmarks that don’t capture real interaction — the gap behind the RECALL conversation-grounded benchmark.
Interference, solved not just diagnosed
LLMs demonstrably can’t suppress outdated information. FadeMem and engram both attempt mechanisms — but “the model retrieves a value it should have forgotten” is far better characterized than it is solved.
Long-horizon memory
Nearly every system operates on days or weeks. What a memory should do over months and years — what genuinely consolidates, what becomes irrecoverable versus merely hard to find — is close to unexplored.
Forgetting vs. retrieval failure
Human memory distinguishes “the trace is gone” from “the trace exists but the path is lost.” Most AI systems only delete. Graded, partially reversible forgetting has very little prior art and real conceptual room.
How this was built
Method & caveats
The landscape was compiled from five escalating rounds of adversarial prior-art search conducted during the engram project (May 2026) — broad academic and OSS sweeps, then patents across the US, Europe, Korea, and China, then focused deep dives on the closest references. The goal was deliberately hostile: not to confirm an idea was new, but to find everything that would prove it wasn’t.
This is a research record and a field reference — not legal advice, and not a freedom-to-operate or patentability opinion.
Patent numbers were verified against live Google Patents pages during the searches; the Korean patent’s claim analysis used machine translation.
Contested points in the neuroscience are marked ⚖; they reflect active scientific debate, not settled fact.
R. Semon(1904). Die Mneme (introducing the term "engram"). Wilhelm Engelmann, Leipzig.
H. Ebbinghaus(1885). Über das Gedächtnis (Memory: A Contribution to Experimental Psychology). Duncker & Humblot.
W. A. Wickelgren(1974). Single-trace fragility theory of memory dynamics. Memory & Cognition, 2(4).
S. Tonegawa, X. Liu, S. Ramirez, et al.(2012–2015). Optogenetic identification and manipulation of memory engram cells. Nature; Science.
E. Tulving(1972). Episodic and semantic memory. Organization of Memory (Academic Press).
C. J. Brainerd & V. F. Reyna(2002). Fuzzy-Trace Theory and false memory. Current Directions in Psychological Science, 11(5).
K. Nader, G. Schafe & J. LeDoux(2000). Fear memories require protein synthesis in the amygdala for reconsolidation after retrieval. Nature, 406.
J. L. McGaugh(2000). Memory — a century of consolidation (amygdala modulation of emotional memory). Science, 287.
M. W. Howard & M. J. Kahana(2002). A distributed representation of temporal context. Journal of Mathematical Psychology, 46.
J. L. McClelland, B. L. McNaughton & R. C. O'Reilly(1995). Why there are complementary learning systems in the hippocampus and neocortex. Psychological Review, 102(3).
L. Nadel & M. Moscovitch(1997). Memory consolidation, retrograde amnesia and the hippocampal complex (multiple-trace theory). Current Opinion in Neurobiology, 7(2).
S. Diekelmann & J. Born(2010). The memory function of sleep (sharp-wave ripples, replay). Nature Reviews Neuroscience, 11.
H. L. Roediger & J. D. Karpicke(2006). Test-enhanced learning: the testing effect and retrieval-induced strengthening. Psychological Science, 17(3).
F. C. Bartlett(1932). Remembering: A Study in Experimental and Social Psychology (schema theory). Cambridge University Press.
G. O. Einstein & M. A. McDaniel(1990). Normal aging and prospective memory. Journal of Experimental Psychology: LMC, 16(4).
B. J. Underwood(1957). Interference and forgetting (proactive interference). Psychological Review, 64(1).
42 prior-art references · 11 mechanisms · a defensive publication, not a patent. Take the map and go further.