This website uses cookies

Read our Privacy policy and Terms of use for more information.

Today’s editorial: continual learning in LLMs, why AI models may need offline consolidation, and what “sleep” means for AI memory, agents, and catastrophic forgetting.

→ Continual Learning Is Back, and It’s About to Put Models to Sleep

By coincidence, last week was all about models and their precious sleep. On May 25, a paper from Carnegie Mellon and the University of Maryland asked: Do Language Models Need Sleep? On June 2, a paper from Google-affiliated researchers answered almost directly: Language Models Need Sleep. This funny timing we can use as a signal: continual learning is back at the center of AI research, now under a different set of pressures.

Continual learning is not a new problem. In classical machine learning, it usually meant training a model on a sequence of tasks without destroying what it had already learned. A model learns task B, then suddenly becomes worse at task A. This is catastrophic forgetting, and the field spent years trying to reduce it through replay, freezing, regularization, routing, and other methods.

LLMs changed the shape of the problem. Today, the question is broader: how can AI systems stay current, specialize to domains and users, learn from experience, and improve after deployment without breaking what they already know? Brutally hard.

A 2026 survey, Continual Learning in Large Language Models, gives a good map of the current field. It divides LLM continual learning into continual pre-training, continual fine-tuning, and continual alignment. It means that a model may need to absorb new general knowledge, adapt to a specific domain or task, or adjust its behavior without losing the alignment that made it useful. The survey’s conclusion states that current methods work in limited settings, but we still do not have smooth learning across tasks and time.

But what is it about sleep?

Of course, models do not literally need sleep. What they need is an offline phase for consolidation. Constant live updating is risky, while doing nothing leaves models stale. There needs to be a phase between seeing something and changing from it. This is what the sleep metaphor is trying to capture: offline processing, when the model is not simply answering the next prompt, but organizing recent experience before deciding what should persist.

The CMU/Maryland paper looks at this from the inference side. Long context is expensive because the KV cache grows as the model attends to more tokens. Some hybrid architectures compress older context into fast weights, but the paper shows that compression alone is not enough. If the model has to reason about information it can no longer directly attend to, it needs more computation before that context is cleared. Their proposed sleep phase gives the model offline recurrent passes over recent context, and the biggest gains appear on tasks that require deeper reasoning. That is the important part: memory is not only storage, it is processing.

The Google-affiliated paper moves closer to continual learning. It starts from a simple limitation: LLMs can adapt inside a context window, but that knowledge usually disappears when the session ends. Its Sleep paradigm proposes two steps. First, “Knowledge Seeding” consolidates short-term knowledge into more stable parameters. Then “Dreaming” uses model-generated synthetic data to rehearse what was recently learned. Biological terms aside, what it means is that durable learning should be separated from live interaction.

This separation may be the useful architecture for continual learning. Without it, the choices are too crude. Either the model stays mostly static and relies on retrieval, or it updates too directly and risks drift. Sleep gives researchers a third frame: the system interacts, collects experience, processes it offline, and only then decides what should remain temporary, what should become memory, and what is allowed to affect future behavior.

This is especially important for agents, because their experience is richer than a document stream. It includes tool calls, failed attempts, user corrections, environmental feedback, and repeated workflows. Recent agent-learning work points in the same direction. A roadmap on lifelong learning for LLM agents frames the problem through perception, memory, and action. Another June 2026 paper, Rethinking Continual Experience Internalization for Self-Evolving LLM Agents, shows why this is still fragile: repeated learning cycles can collapse instead of compound when experience is internalized poorly.

I also want to mention OpenAI’s June 4 memory update for ChatGPT called Dreaming. Its “dreaming” system synthesizes user memory in the background to improve freshness, continuity, and relevance across conversations. This is system-side memory, not proof that parametric continual learning is solved. But still, it shows the same pressure appearing in production: memory cannot remain a static list of notes forever.

What we see is that the field needs to move beyond the idea of continuous updating. What feels new this week is the search for a controlled phase between experience and change. Sleep becomes interesting as a boundary: a moment when the system can decide what deserves to persist, what should stay temporary, and what should be discarded. We anticipate a few breakthroughs in continual learning coming this year.

If any of those thoughts resonate with you – share them across your social networks. Let’s keep the conversation going.

Follow us on 🎥 YouTube Twitter Hugging Face 🤗

Twitter Library

We are reading / watching

News from the usual suspects ™

  • Axiom pushed formal verification beyond pure math into economics. It announced EconLib, a Lean-based library for economic theory, starting with a formalization of Robert Aumann’s “agreeing to disagree” theorem. AxiomProver didn’t just verify the proof; it surfaced an implicit assumption in the underlying logic, then also proved the Monderer-Samet p-belief version. The project aims to become a Mathlib-style foundation for game theory, Nash equilibria, auction theory, information economics, and prediction-market logic – read the paper, see the code

  • Sakana AI made recursive self-improvement its explicit research agenda. It launched the Sakana AI RSI Lab in Tokyo, a dedicated group focused on using AI to redesign the AI development process itself. The lab brings together Sakana’s recent line of work on AI-generated optimization algorithms, self-rewriting agents, program evolution, self-learning reinforcement agents, adversarial coevolution, and The AI Scientist.

  • OpenAI pushed Codex beyond software engineering with role-specific plugins, Sites, and annotations for analysts, marketers, designers, sales teams, investors, and bankers. It also upgraded GPT-Rosalind for life sciences workflows and began rolling out Dreaming, a more scalable memory system for ChatGPT.

  • Anthropic published a cyber-threat analysis showing how AI-enabled attackers are moving deeper into the attack chain and exposing gaps in existing security frameworks like MITRE ATT&CK →read the report

  • NVIDIA turned South Korea into the week’s AI infrastructure stage. It announced deals with SK Hynix, SK Telecom, Naver, Doosan, LG, and Hyundai around memory supply, AI factories, robotics, data centers, autonomous mobility, and AI-powered manufacturing. Separately, Naver said it will build gigawatt-scale AI factories using NVIDIA technology, while LG is working with NVIDIA on humanoid robots and future data centers.

  • Meta entered the enterprise-agent race with Meta Business Agent, expanding AI agents across WhatsApp, Messenger, and Instagram for customer support, sales, bookings, and business operations. But the week also exposed friction: its Muse Spark API was reportedly delayed, and Meta removed face-recognition code from its smart-glasses companion app after WIRED scrutiny.

  • Apple finally gave WWDC an AI answer: Siri AI, a more conversational, contextual, systemwide assistant designed to work across apps while relying on on-device processing and Private Cloud Compute where possible. Reports also point to Google’s Gemini as part of the new Siri architecture.

  • Washington moved frontier model release closer to national-security process. The White House signed an AI cybersecurity and frontier-model order asking leading AI developers to voluntarily submit covered models for government cybersecurity review before release, then followed with a national-security AI push focused on faster adoption, updated autonomous-weapons guidance, and multi-vendor AI use inside government.

Research highlight

Researchers from Harvard, MIT, 2077AI, and Kempner Institute built an LLM agent “economy” where agents bid in auctions, pay each other, gain wealth from rewards, mutate if successful, and go bankrupt if ineffective. Starting with weak agents, it improved MATH from 15.9% to 57.0%, finance from 45.0% to 60.0%, science best-run accuracy from 5.0% to 20.0%, accelerator EDP from 80.2 to 39.3, and Cloudcast cost from 930 to 657.

(also a good read from 1995, Toward a Model of Mind as a Laissez-Faire Economy of Idiots by Eric B. Baum)

Open-sourced Models

  • Xiaomi MiMo + TileRT pushes a 1-trillion-parameter model past 1,000 tokens per second on commodity GPUs. The key claim is inference speed on a 1T parameter model at commodity hardware levels — if real and reproducible, it changes the economics of what can be deployed without cloud dependency. Worth watching for replication.

  • Gemma 4 12B (Google DeepMind): Runs on a laptop — the 12B parameter model brings Google's Gemma family to a size that can run locally on consumer hardware. For agent workflows that need to run on-device rather than cloud-dependent, this is a meaningful step forward in the accessibility of capable models.

Research

Trends we see looking at every paper related to AI and ML published last week:

  • personalization instead of one-model-fits-all

  • agents instead of chatbots

  • world models instead of pure language scaling

  • evaluation becoming training

  • automated research

  • memory and self-improvement

  • reasoning efficiency

Agent reliability, memory, and self-improvement

Search, retrieval, and long-context reasoning

World models, physical AI, and embodied reasoning

Model adaptation, efficiency, and scalable personalization

RL, distillation, and reward design

Automation, research agents, and agent security

That’s all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

FAQ

What is continual learning in LLMs?
Continual learning in LLMs means updating or adapting a model over time without destroying earlier capabilities, alignment, or useful knowledge.

Why do AI models “need sleep”?
They do not literally need sleep. The point is that learning may need an offline consolidation phase, where recent context or experience is processed before anything becomes durable memory or model behavior.

What is catastrophic forgetting?
Catastrophic forgetting happens when a model learns something new but loses performance on what it previously knew.

Reply

Avatar

or to participate

Keep Reading