AI 101: What is Continual Learning?

❝

If you think about the term AGI, especially in the context of pre-training, you will realize that the human being is not an AGI, because a human being lacks a huge amount of knowledge. Instead, we rely on continual learning.

Ilya Sutskever

Do you feel this shift too? The idea of models learning endlessly is showing up everywhere. We see it, we hear it, and it’s all pushing the spotlight toward continual learning.

Continual learning is the ability to keep learning new things over time without forgetting what you already know. Humans do this naturally (as Ilya Sutskever also noted) and they are very flexible to changing data. But, unfortunately, neural networks are not. When developers change the training data, they often face something that is called catastrophic forgetting: the model starts loosing its previous knowledge, and returns to training model from scratch.

Finding the very balance between a model’s plasticity and its stability in previously learned knowledge and skills is becoming a serious challenge right now. Continual learning is the path to more “intelligent” systems that will save time, resources, and money spent on training, it helps mitigate biases and errors, and, in the end, things can just go easier and more naturally with model deployment.

Today we’ll look at the basics of continual learning and two approaches that are worth your attention: very recent Google’s Nested Learning and Meta FAIR’s Sparse Memory Finetuning. There is a lot to explore →

In today’s episode, we will cover:

Continual Learning: the essential basics
Setups and scenarios for Continual Learning training
How to help models learn continually? General methods
What is Nested Learning?
- How does Nested Learning work?
- HOPE: Google’s architecture for continual learning
- Not without limitations
Cautious continual learning with memory layers
- Sparse Memory Finetuning
- Limitations
Conclusion / Why continual learning is important now?
Sources and further reading

Continual Learning: The essential basics

Continual learning means learning step-by-step from data that changes over time. So it is related to two main things:

Non-stationary data, which means the data distribution does not stay the same and keeps shifting.
Incremental learning – the model should add new knowledge without wiping out what it learned before.

The new pieces of information can be new skills, new examples, new environments, or new contexts. As the data comes in gradually, continual learning is also known as lifelong learning. The process of continual learning happens when the model is already deployed.

Everything would be great if models didn’t face one major challenge – catastrophic forgetting. This problem generally looks like this: a neural network is trained on Task 2 after Task 1, and its weights are updated for Task 2. This often pushes them away from the optimum for Task 1, and the model suddenly performs very poorly on that task.

The problem here is not the model’s capacity – this usually happens because of the sequential training procedure. Even in 1989-1990, Michael McCloskey and Neal J. Cohen and R. Ratcliff identified this problem and showed that simple networks lose previous knowledge extremely quickly when trained sequentially. They also highlighted that this forgetting is much worse than in humans.

But if you train on Tasks 1 and 2 interleaved, forgetting does not happen.

Image Credit: Illustration of catastrophic forgetting, “Continual Learning and Catastrophic Forgetting” paper

Preventing forgetting is only one part of the solution. Effective continual learning also requires:

Fast adaptation
Ability to leverage task similarities
Task-agnostic behavior
Robustness to noise
High efficiency in memory and compute
Avoiding storing all past data and retraining on all previous data

If tasks are related, the model should get better at one after learning another, which marks positive knowledge transfer:

Forward transfer → Task 1 helps Task 2 later.
Backward transfer → Task 2 helps improve Task 1. This is a more difficult variant for neural networks.

So, a good continual learning system needs the right balance: it should stay stable (not forget old things) while still being plastic enough to learn new ones. It also needs to handle differences within each task and across different tasks. How is it released on practice?

Image Credit: “A Comprehensive Survey of Continual Learning: Theory, Method and Application” paper

Setups and scenarios for Continual Learning training

Continual learning is mainly about moving from one task to the next while keeping performance stable or improving it during ongoing learning. That’s why two fundamental setups are used for it:

Task-based continual learning: Data is organized into clear, separate tasks which are shown one after another, with explicit task boundaries. It is the most common setup, because it is convenient and controlled – you know exactly when tasks switch. But it doesn’t represent gradual changes found in the real world, and models may rely too heavily on boundaries for memory updates.
Task-free continual learning: This one is more realistic, because it better reflects real-world data where distributions shift continuously. There is still an underlying set of tasks, but task boundaries are not given and transitions are smooth.

Image Credit: “Continual Learning and Catastrophic Forgetting” paper

Continual learning researchers often uses three main scenarios to describe what the model is expected to know at test time and whether it gets task identity information. Importantly, these scenarios are defined by how the changing data relates to the function the network must learn:

UPGRADE TO READ THE REST

Join Premium members from top companies like Microsoft, Nvidia, Google, Hugging Face, OpenAI, a16z, plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on with AI. Learn the basics and go deeper👆🏼

AI 101: What is Continual Learning?

Continual Learning: The essential basics

Setups and scenarios for Continual Learning training

Reply

Keep Reading

Turing Post