TL;DR: Meta-learning teaches AI systems how to adapt quickly to new tasks with limited data. This article explains MAML, Prototypical Networks, model-based approaches, Meta-LoRA, reinforced meta-thinking agents, and meta-evaluation – showing why “learning to learn” matters so much for modern AI.
What sets a brilliant mind apart? The ability to learn how to learn. It’s the secret sauce for humans who thrive, navigating life’s challenges with ease. The same holds true for intelligent systems. As humans we develop this skill – often without even noticing – through school, university, and life’s endless lessons, learning to recognize familiar patterns across tasks that help us pick up new things more effectively. Models can acquire a similar ability too, through a process called meta-learning.
Meta-learning is the key to fast, flexible, and efficient adaptation of models to new, unseen tasks with minimal data. It allows models to learn from a few examples, gain experience, and use their memory effectively. Meta-learning isn’t on the same level as supervised, unsupervised, or reinforcement learning – it’s a higher-level framework that can be applied on top of them.
Today, we’re going to explore the basics of meta-learning, the most fascinating up-to-date developments (the one super exciting is what not to learn), how meta-learning helps with evals (meta-evaluation), and more (Brain In-Context, anyone?). It’s a lot to unpack! But first →

Image Credit: Turing Post via Claude
In today’s episode, we will cover:
How it all started: The first mentions of meta-learning
How does meta-learning work?
Common meta-learning approaches
Optimization-based meta-learning
Metric-based meta-learning
Model-based meta-leaning
Recent advances in meta-learning
Robustly Informed Meta Learning (RIME)
Meta-LoRA
Reinforced Meta-thinking Agents (ReMA)
Meta-evaluation
General limitations
Conclusion
Sources and further reading
What is meta-learning? Plain English Definition
The idea of adaptive systems and machines that could modify their own instructions emerged back in 20th century. But the pioneer of bringing the concept of learning to learn into neural networks and modern meta-learning frameworks was – ta-da-dam – Jürgen Schmidhuber.
In the work "Evolutionary principles in self-referential learning" (1987), he described self-improving systems that anticipated some aspects of meta-learning. In "A self-referential weight matrix" (1993) and "Reducing the ratio between learning complexity and number of time-varying variables in fully recurrent nets" (1993), he proposed architectures where one Recurrent Neural Network (RNN) modifies the weights of another RNN – it is an early form of gradient-based meta-learning (we will clarify what it is further).
Then in 1998, a book “Learning to Learn” by Sebastian Thrun and Lorien Pratt was among the first to put together various methods and ideas under the meta-learning umbrella. After that, in 2000 Jonathan Baxter published a paper, called "A Model of Inductive Bias Learning," which provided a PAC-learning (Probably Approximately Correct learning) framework. He showed that if you train on many tasks from the same family, then you can learn a useful inductive bias (a kind of prior knowledge) that helps you learn new tasks faster.
And that brings us to what meta-learning is all about. It is a concept where a model is trained on many tasks, rather than one single task, so that it can quickly adapt to new tasks using only a small amount of data. Few-shot image classification is a popular example of how meta-learning performs – after meta-learning, a model can learn to classify new categories from only a few training images.
Meta-learning entire idea conceptually differs from other learning approaches. For example, supervised learning aims to train a model to perform well on a single specific task using labeled data; unsupervised learning works without labels, seeking patterns, clusters, or latent structures within raw input data, and the model keeps learning from the dataset; and finally, reinforcement learning (RL) teaches an agent to act in an environment through trial and error by maximizing its reward over time.
Meta-learning is a framework rather than a specific type of learning. It doesn't rely on large datasets per task for fast adaptation. Instead, it’s about improving a model’s ability to adapt quickly to new tasks by training it across many tasks. These tasks can be supervised (e.g. few-shot classification), reinforcement-based (e.g. learning policies faster), or unsupervised (e.g. learning to cluster or represent data efficiently). Another key point is that meta-learning enables models to apply learned skills across different scenarios. A few concrete meta-learning task →
Spotting Rare Animals
Show the model five pangolins. Then ask if a new photo is a pangolin.
Meta-learning helps it decide – with just a few examples.Teaching a Robot New Tricks
The robot has opened drawers and turned knobs. Now it needs to pull a lever.
Thanks to meta-learning, it adapts fast.Adapting to a New Writing Style
An AI assistant sees just 2–3 emails from a new user.
Meta-learning lets it mimic their tone almost instantly.
Let’s unfold how the actual workflow of the meta-learning process looks.
How Meta-learning Works: Key Algorithms

Basically, meta-learning trains a model to quickly learn new tasks – even from just a few examples. It includes two stages:
During meta-training, a learner model practices learning from many different tasks. This helps it find common patterns in tasks and build general strategies for learning, that will help it to deal with new tasks later.
After that, during meta-testing, a learner model uses what it learned to adapt quickly to a completely new task using only a small amount of data.
The key idea is that all tasks come from some broader “task universe,” so they share hidden similarities. Meta-learning uses these shared patterns to get better at fast adaptation.
One of the most popular conceptual views on meta-learning is to look at this process from the perspective of two models:
A base-learner (or just learner) is a model that learns to perform a specific task, being trained on this task’s data. To make it more clear, we can say that this model works in “the inner learning loop.” For example, in few-shot classification, the base-learner might be a neural network that tries to classify images within one task.
Overall, the base-learner is the one that needs to learn to adapt quickly to a given task using the small training set for that task.
A meta-learner is responsible for “the outer learning loop.” Considering how the base-learner performed on each task, the meta-learner updates the base model’s parameters or learning strategy so that it gradually becomes better at learning any new task.
After training, the base-learner is initialized using what the meta-learner has learned, for example, a good starting point for weights or a learned learning strategy. This entire workflow explicitly prepares the model to handle new real tasks, not just re-use what it already knows.
Different meta-learning approaches vary in what exactly is learned, like the initial weights of a neural network, the learning rate, similarity metrics, etc. Below, we break down three major meta-learning approaches.
Meta-learning vs Transfer Learning vs Fine-Tuning
MAML Explained: Model-Agnostic Meta-Learning
This method is about making the optimization algorithm itself better. It is also called gradient-based meta-learning.
A classic example of this approach is Model-Agnostic Meta-Learning algorithm, or MAML, developed by University of California, Berkeley and OpenAI researchers. Its key idea is to train the model’s starting point (initial parameters), using a gradient descent training process to adapt to new tasks.
MAML learns an initial parameter set θ. During meta-training, for each task, MAML makes a copy of the current model, trains it for a few gradient steps on that task’s training data (think of it as the inner loop), and then measures how well it did on that task’s test data. The outer loop adjusts the initial parameters θ so that these one-or-few-shot fine-tuning steps lead to better performance. By doing so across tasks, θ becomes a powerful starting point for new tasks.

Image Credit: MAML original paper
Importantly, MAML is model-agnostic, so it can work with any model architecture and any task that is trained via gradient descent. Overall, this approach is like teaching someone the basics so they can quickly learn new skills later.
Prototypical Networks & Metric-based Meta-learning
This one encourages the model to learn a better way to measure distance or similarity between examples, new and known ones, so it can group things that belong together more effectively. Instead of comparing raw inputs directly, the model converts inputs into embedding vectors (compressed, meaningful summaries of the data) and compares those embeddings via a learned similarity function. Let’s look at couple of examples.
Prototypical Networks (by University of Toronto and Twitter) don’t compare the new example to all support set examples. Instead, they compute a "class prototypes", meaning average embedding for each class, and compare the new example embedding to these class prototypes, using a distance function (usually Euclidean distance). Closer prototype means higher probability.

Image Credit: Prototypical Networks for Few-shot Learning paper
Matching Networks (by Google DeepMind) rather than computing a class prototype, compare a query point directly to every support example, using a similarity function (like cosine similarity). Then they turn these into weights via softmax. Prediction will be weighted average of support labels based on the similarities.

Image Credit: Prototypical Networks for Few-shot Learning paper
Model-Based Meta-Learning: Memory-Augmented Networks
Last but not least approach is model-based meta-learning where the entire model is designed to learn how to quickly adapt using memory or dynamics built into the model itself. This model can remember, adapt, and solve tasks using its own structure. Model-based meta-learners usually include components, such as RNNs like LSTMs, external memory and controllers that learn how to use memory to store and retrieve task-specific information.
A good illustrative example of such systems is Memory-Augmented Neural Networks (MANN) by GoogleDeepMind. MANN is shown an input, like an image at time t, then it’s shown a label for this input at time t+1. This encourages model to learn to store and retrieve information. MANN bind each input to its label when the label finally arrives and store this pair in external memory. Later, when the model sees a similar input again, it can retrieve the matching label from memory and make the right prediction.

Image Credit: Meta-Learning with Memory-Augmented Neural Networks paper
This is what we have for the basics of meta-learning. Let’s move on to what current trends and studies bring to improving meta-learning paradigm.
Meta-learning in LLMs: Few-Shot Prompting & Recent Advances
Robustly Informed Meta Learning (RIME)
A notable trend in AI is teaching models not just what to learn, but also what not to learn. A development from Louis McConnell, called Robustly Informed Meta Learning (RIME), implements this idea to meta-learning to exclude wrong learning patterns. For example, a model might predict disease from X-ray images – but instead of learning patterns in the lungs, it accidentally relies on irrelevant clues like hospital scanner type, hospital ID, or patient age (these are called spurious features). These shortcuts work in training but fail in new environments like different hospitals.
RIME works within a causal framework to disentangle the real signals (causes) from the spurious ones (nuisance). It uses two special methods for this:
Inverse Probability Weighting (IPW): RIME "reweights" training data to break the statistical link between real labels and spurious features. For each example, it computes how likely the label (e.g. "sick") is given the spurious feature (like “patient age”) and then adjusts the importance of that example, giving less weight to them.
Learning right representations: Even after reweighting, the model could still sneak in spurious information through the internal features. That’s why RIME learns a representation of the input and adds a loss function that penalizes the model if this input contains information about the wrong feature.

Image Credit: RIME original paper
Meta-LoRA
Another interesting approach is to use meta-adapters in a framework to meta-train foundation models. Researchers from the University of Texas at Austin proposed Meta-LoRA (Low Rank Adaptation) that improves how models adapt to new tasks after retraining, using a meta-learning objective.
Instead of retraining separately for each task, Meta-LoRA finds a shared low-rank adapter matrix that works well when combined with small, task-specific updates across multiple tasks. If trained on at least 3 tasks, Meta-LoRA can exactly recover the true underlying parameters of the model. In practice, even simple optimization methods, like gradient descent, are effective in learning this shared adapter.
This work is exciting because it leverages meta-learning to solve the adaptation problem in very large models, which is especially important today, given the trend toward large-scale models.
Reinforced Meta-thinking Agents (ReMA)
This one is a very fascinating and complex development by researchers from Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory, University of British Columbia, and University College London. ReMA combines meta-learning and one of the current hottest topics, reinforcement learning (RL), to help LLMs think more effectively, especially when multiple LLM agents work together. At its core, ReMA breaks problem-solving into two parts:
Meta-thinking phase: Plan or adjust the strategy.
Reasoning phase: Follow that strategy to solve the problem.
In single-agent scenarios, one agent do both parts which is not efficient, so ReMA uses two specialized agents in setup called Multi-Agent Meta-thinking Reasoning Process (MAMRP):
A high-level agent handles meta-thinking to produce a meta-plan.
A low-level agent uses it to produce an answer, performing reasoning.
If needed, the high-level agent updates the plan, and the low-level agent continues solving.
Both agents share the same LLM model, but are told to act differently using special prompts.

Image Credit: ReMA original paper
ReMA approach encourages to maximize models effectiveness through RL. It uses Multi-Agent RL (leveraging turn-level training with GRPO) where each agent tries to improve its part of the process – high-level agent tries to pick meta-thoughts that lead to better results, and low-level agent learns to solve well given a meta-plan.
As a result, ReMA outperformed all baselines and achieved a maximum improvement on math benchmarks (6.68%) and on LLM-as-a-Judge benchmarks (8.49%).
Meta-evaluation
Rewards systems are what we build and what we can also make better. In this case, meta-learning can be a good helper. Researchers from the University of Minnesota, MIT, Grammarly, and Elice, introduced Meta Policy Optimization (MPO) framework that allows the reward system to learn how to evaluate better, just as the policy model learns how to perform better.
It involves building a feedback loop where not just the student (LLM), but also the teacher (Reward Model, RM), improves over time guided by a senior advisor (Meta Reward Model, MRM). Reward model isn’t static anymore, it can evolve during training just like how a person gets better at judging things as they gain experience.
Brain In-Context Representation Learning
The last development that we just can’t get through is BraInCoRL (Brain In-Context Representation Learning). It demonstrates the potential of meta-learning as an approach that can be applied in fields where data collection is limited.
In this work, University of Hong Kong, Carnegie Mellon researchers and their colleges from other universities designed BraInCoRL as a model that can predict brain activity (voxel responses) when someone sees an image. Thanks to meta-learning approach it avoids retraining the model for each new person. It treats each voxel (a small unit of brain data) as its own learning task. BraInCoRL uses a transformer model to learn from in-context examples – pairs of images and brain responses. It finds common patterns and generates right responses for a new person on-the-fly.
If the model can figure out what’s going on in a complex system, such as the human brain, then it can apply this learning skill to other systems, even when there is not enough data. However, despite being conceptually a cool approach that moves us closer to human-like processing of information, meta-learning has some serious issues.
Limitations of Meta-Learning
Meta-learning needs a large distribution of related but different small tasks to train effectively. If there is not enough examples, meta-learning struggles.
Training is slower and heavier on memory, which is computationally expensive especially with gradient-based approaches like MAML, as you may need to backpropagate through gradient steps.
A meta-learner can become too specialized to the training tasks.
It's often unclear what the meta-learner has really captured.
Meta-learning typically assumes episodic training with support/query sets per task, and this doesn’t always fit in tasks like continuous control, time-series forecasting, or open-ended NLP.
The main thing is that meta-learning is still underexplored — why and when meta-learning works stays an open question.
Meta-Learning: Key Takeaways
Meta-learning concept shows that relying only on large amounts of data isn’t enough. Modern AI systems need to figure out patterns on their own – and they need to know how to learn. Today, we have meta-learning systems that allow models to learn what information is better to skip and how to make better evaluations. We can also enhance the effectiveness of very large models with meta-learning adapters, blend meta-learning with reinforcement learning for more efficient model and agent behavior, and apply the meta-learning paradigm in cases where data is limited.
While some models focus on a single task or domain, others bet on multitasking. Whether it’s the first or the second case, models should be able to adapt – and do it quickly – to meet the demands of today’s world. Feeding a model with all the knowledge we have is nearly impossible, which is why we should use every opportunity to teach models how to learn.
Sources and further reading
Evolutionary principles in self-referential learning (1987) by Jürgen Schmidhuber
A ‘Self-Referential’ Weight Matrix (1993) by Jürgen Schmidhuber
What is meta-learning? (IBM blog)
Meta-Learning: Learning to Learn Fast (Lilian Weng blog)
A Model of Inductive Bias Learning (2000) by J. Baxter
FAQ
What is meta-learning?
Meta-learning is a framework for training AI models to learn how to learn. Instead of optimizing a model for only one task, meta-learning exposes it to many related tasks so it can adapt faster to new ones, often using only a few examples.
What is LoRA in LLM?
LoRA, or Low-Rank Adaptation, is a parameter-efficient fine-tuning method for large language models. Instead of updating all model weights, it adds small trainable low-rank matrices to selected layers, making adaptation cheaper, faster, and easier to store or switch between tasks.
What are the 4 components of reinforcement learning?
The four core components of reinforcement learning are the agent, the environment, actions, and rewards. The agent takes actions in an environment, receives rewards or penalties, and learns a policy that helps it maximize long-term reward over time.
What is meta-evaluation?
Meta-evaluation means evaluating the evaluation process itself. In AI, it asks whether metrics, benchmarks, reward models, or judge models are reliable, consistent, and useful. In meta-learning contexts, it can also involve systems that learn how to improve their own evaluation signals.


