Turing Post
Posts
AI 101: Conditional Memory and the Rise of Selective Intelligence

AI 101: Conditional Memory and the Rise of Selective Intelligence

let's explore this new memory paradigm and why it's important as architectural principle

Ksenia Se
February 04, 2026

For most of deep learning’s history, memory was treated as something to be baked into the model. If a system needed to know more, we gave it more parameters. If it needed to remember longer, we extended the context window. If it forgot, we retrained.

That approach worked astonishingly well. It also locked us into a very specific assumption: that intelligence scales by touching everything, every time.

We are now watching how that assumption is changing.

A recent paper on Conditional Memory via Scalable Lookup from DeepSeek and Peking University does not introduce a better model or a new benchmark record (which they are also good at). What they come up with is a different organizing principle. They called it Engram – a conditional memory module that treats memory as something the model chooses to access.

That choice turns out to matter more than it first appears. I’ll call this shift selective intelligence.

In today’s episode, we will cover:

Why memory inside LLMs is starting to crack
Four kinds of “memory” we keep confusing
Why this is about routing, not retrieval
How Engram works: architecture and technical design
The U-Shaped allocation law: a fundamental discovery
Why Engram improves reasoning, not just memorization
Long-context performance: structural advantages
System efficiency: decoupling compute and memory
Large-scale pre-training: empirical validation
Not without limitations
Conclusion: selective intelligence as an architectural principle
Sources and further reading

Why memory inside LLMs is starting to crack

Large language models store knowledge implicitly. Facts, patterns, abstractions, and behaviors are distributed across billions of parameters. When a prompt arrives, attention mechanisms sweep across a dense internal space, activating everything that might be relevant. This design has two structural consequences:

First, recall is expensive. Every token pays the cost of dense computation, even when only a small fraction of the model's knowledge is actually needed. Processing the phrase "Alexander the Great" requires multiple early layers to recognize this as a single composite entity, not three unrelated words. The model essentially reconstructs a static lookup table at runtime, consuming sequential depth that could be allocated to higher-level reasoning.
Second, memory is passive. The model cannot decide not to think about something. It can only weigh it down through attention scores. As models scale, both problems become structural. Longer contexts increase cost quadratically in standard attention mechanisms. Larger models increase latency and energy usage. And attempts to fix memory by brute force – more parameters, longer contexts – begin to resemble the same strategy that created the problem.

This is the backdrop against which conditional memory becomes interesting.

Four kinds of “memory” we keep confusing

Before getting into the technique itself, it helps to separate three ideas that are often collapsed into one.

Don’t settle for shallow articles. Learn the basics and go deeper with us. Truly understanding things is deeply satisfying →

Join Premium members from top companies like Microsoft, Nvidia, Google, Hugging Face, OpenAI, a16z, plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on in AI.

Reply

or to participate.