Today, we’re exploring Flow Matching (FM), a concept that might sound complex but is more approachable than it seems. If it feels overwhelming at first, don’t worry – by the end of this episode, you’ll have a clear understanding of its key ideas and practical applications.
Why is Flow Matching worth discussing now? It’s gaining attention for its role in top generative models like Flux (text-to-image), F5-TTS and E2-TTS (text-to-speech), and Meta’s MovieGen (text-to-video). These models consistently achieve state-of-the-art results, and some experts argue that FM might even surpass diffusion models. But why is that the case?
FM enhances Continuous Normalizing Flows (CNFs), a framework for generating realistic samples of complex data – whether images, audio, or text – starting from structured noise. While powerful, CNFs face challenges such as long training times and intricate techniques for speeding up sampling. Flow Matching tackles these issues by optimizing the path from noise to structured outputs, streamlining CNFs and reducing the inefficiencies caused by differential equation computations. Put simply, FM focuses on learning how to match flows of probability distributions over time.
Still sounds tricky? Let’s break it all down, examine the details, and provide real-world examples of its implementation so you can see its potential in action. Let’s get started!
In today’s episode, we will cover:
Continuous Normalizing Flows (CNFs) and their limitations
Here comes Flow Matching
How does Flow Matching work?
How does Conditional Flow Matching (CFM) help?
What about diffusion models?
Advantages of Flow Matching
Not without limitations
Conclusion and implementation
Bonus: Resources to dive deeper
Continuous Normalizing Flows (CNFs) and their limitations
Let’s start from the very beginning and make clear what are the Continuous Normalizing Flows (CNFs).
CNFs are a type of flexible framework or mathematical tool used in generative modeling to transform simple data (like random noise) into complex distributions (like realistic images or sounds). Unlike diffusion models, which slowly add and remove noise and have limited ways of data processing, CNFs can handle a broader range of data transformations. They do this by gradually and smoothly reshaping the data using a process guided by a vector field. Let’s break it down in simple terms.
Key concepts of CNFs:
Data space: This is where the data “lives”. For example, an image with pixels could exist in a high-dimensional space with a fixed number of pixels.
Probability density path: Probability density path describes how the data's probability distribution evolves over time. Imagine data as points in a space. To move from a simple noise to realistic data, we use a probability path, which is a gradual transformation.
Vector field: Think of it as a map that guides how data points should move at every moment to realize the transformation.
Flow: And finally, flow reshapes the data step by step over time, guided by the vector field. It defines a continuous transformation. But to be more accurate in terms, in CNFs, you need to solve an Ordinary Differential Equation (ODE), that tells how data moves based on the vector field. This turns the transformation into a flow of probability over time.
Instead of manually creating the vector field, researchers use a neural network to learn it. This neural network is like GPS – it takes data points as input and predicts where they should move to match the desired distribution. The network reshapes the data into a realistic and complex data distribution, such as a detailed image or piece of music, using a rule called the push-forward equation, which ensures that the transformation follows the rules of probability (the total probability remains 1 at all times).
This smooth, continuous transformation from noise to data is CNFs’ main advantage, which makes them a powerful tool for generative modeling. However, CNFs have a serious limitation – solving Ordinary Differential Equations during training is slow, difficult and computationally expensive.
Here comes Flow Matching
Flow Matching is a faster and simpler way to train CNFs without the need for expensive simulations. The concept of Flow Matching in generative modeling was first introduced by Meta AI FAIR and Weizmann Institute of Science researchers in 2022 in the paper "Flow Matching for Generative Modeling".
FM builds on techniques like Normalizing Flows and diffusion models, but it takes a smarter and more efficient approach to transform data distributions. Flow Matching is a simulation-free method to train CNFs, making them faster, more efficient, and more flexible, while still leveraging the powerful capabilities of CNFs.
Instead of solving ODEs during training, FM uses a regression-based objective to directly match a learned vector field (from a neural network) to a target vector field. This avoids the computational burden of integrating ODEs in the training phase. But how does this process look like?
How does Flow Matching work?
Flow Matching trains CNFs by comparing the path of transformation, rather than solving the entire process (ODEs) at every step. Think of it as training a "map" (vector field) that shows how data points move from one distribution to another. FM supports various probability paths for transforming data, going beyond diffusion-based methods. Let’s break the whole FM idea into smaller and clear concepts.
Key concepts of Flow Matching:
Transforming distributions: The idea of FM is to gradually transform a simple, known probability distribution (like noise) into a complex target distribution (like real-world data distribution) using a learned flow field. This transformation is represented as a path between distributions over time.
Vector field learning: Instead of modeling the entire data transformation in one go, FM learns a vector field that describes how data points move from one distribution to the next along the probability path.
Probability paths: These are predefined trajectories in probability space that data points follow during the transformation. FM aims to align its learned flow with these paths.
Neural networks: It’s a learnable approximation of the vector field, trained to match the actual vector field.
Training objective: The method minimizes the mismatch between the probability flow path and the learned flow by solving a supervised regression problem. When the loss is minimized, the neural network accurately models the transformations, and the CNF can generate the desired distributions.
In easy words, the process of FM looks like this:
Imagine we have two data shapes, for example, a Gaussian blob (source or starting point) and a spiral (target).
To get from the blob to the spiral, we start with a simple guess: points move in straight lines at constant speeds. This is a rough estimate and often leads to crossing paths.
The model learns a vector field, which is like a "wind map" guiding how points move over time.
It averages the motion of many particles to figure out the smoothest way to flow the source into the target.

Image credit: Flow With What You Know blog
However, directly computing the marginal probability paths (the overall evolution of the entire dataset as a probability distribution over time) and the vector field is often intractable. Flow Matching overcomes this with another improved technique, proposed in the same study.
How does Conditional Flow Matching (CFM) help?
Inspired by existing techniques like denoising score matching in diffusion models, Conditional Flow Matching (CFM) simplifies training even further. CFM introduces the following strategies to make Flow Matching practical for training complex models:
Focus on conditional probability paths:
Instead of directly modeling the entire transformation, CFM constructs simpler conditional paths for individual data samples and their associated vector fields rather than global ones. They are easier to work with and don't need detailed information about the entire transformation process. This effectively breaks down a global problem into simpler, local problems.
Simplifying the loss function:
While the Flow Matching loss requires knowledge of the entire marginal probability path and vector field, focusing on smaller conditional paths and vector fields results in a simpler and more computable objective.
Equivalence to Flow Matching:
Despite simplifying the problem, CFM ensures that optimizing the Conditional Flow Matching loss produces the same gradients as the full FM objective. So CFM can maintain the theoretical guarantees of FM while being more efficient in practice and allowing scalable training on large datasets.
In short, CFM is like building a smarter GPS system for moving data points between distributions, making the process faster and more adaptable and scalable to high-dimensional datasets.

Conditional paths. Image Credit: An introduction to Flow Matching, Cambridge MLG Blog
These paths of individual data points may cross during training, which can confuse the model. And what is important is that Flow Matching approach focuses on transforming distributions, not just specific points. So the system learns to estimate these trajectories into one general flow, which is not crossed and allows the whole flow process to be reversible.

Marginal paths. Image Credit: An introduction to Flow Matching, Cambridge MLG Blog
What about diffusion models?
While in CNFs, differential equations define data distributions, in diffusion models, data is progressively "noised" and then denoised along a probability trajectory, using stochastic processes (randomness). It also requires score matching techniques for training.
There is a common opinion, that diffusion models follow curved diffusion probability paths because of adding noise and then removing it, while FM is often said to create "straight paths," but this is only true if the model perfectly predicts a single data point.
However, Google DeepMind’s researchers clarified that this is not always true.
In real-world scenarios (like working with images), in FM the predictions average over a distribution, leading to paths that can look curved, depending on the data’s structure and distribution. And on the contrary, deterministic samplers in diffusion models can produce paths that resemble straight lines in specific conditions, making them behave similarly to flow matching.
FM and diffusion models are equally effective, but have several differences:
FM prefers linear interpolation, while diffusion models rely on noise schedules.
Difference in sampling:
Diffusion is usually stochastic (a bit random), especially in simpler setups.
Flow matching can work with deterministic paths, and that’s why is less random.
Diffusion models work in a "low-resolution" space, which can slow down training and sampling. FM operates in a higher-resolution space than diffusion models, leading to faster and more efficient training.
Since FM and diffusion models are mathematically equivalent, you can mix and match techniques from both. For instance, train using one method and sample using another, or use flow matching to simplify the training paths but borrow diffusion’s techniques for dealing with randomness.
Flow Matching can also enhance diffusion models, making them faster and more flexible and accurate, because FM supports a wider range of probability paths, including:
Optimal Transport (OT) paths, that are straight-line trajectories between noise and data.
Curved paths or other complex paths tailored to the specific needs of the data.
Here’s a brief summary of advantages of diffusion models enhanced by FM:


