• Turing Post
  • Posts
  • AI 101: What is LeJEPA? The Theory Upgrade JEPA Has Been Missing

AI 101: What is LeJEPA? The Theory Upgrade JEPA Has Been Missing

We discuss how Yann LeCun and Randall Balestriero's latest research drives JEPA foundations and puts it into solid, meaningful practice – and what it means for the community. Time to work on world models!

Joint-Embedding Predictive Architecture, or JEPA, was introduced in February 2022, when Yann LeCun firstly proposed it as the center of his vision for building AI systems that can understand and reason about the world the way humans and animals do. JEPA doesn’t predict the next token or pixel, but predicts the internal world representations – the abstract state of the world that matters most for the world models and object-driven AI, which operate at the next level of intelligence.

JEPA stands as one of the strongest alternatives to auto-regressive generative architectures, such as LLMs, that have no common sense or grounded understanding of reality, have no memory, can’t plan their answer and often hallucinate.

Note: JEPA was proposed before the LLM bonanza – not as a response to it. Later, Yann LeCun emphasized many times:

If you are interested in human-level AI, don't work on LLMs.

Yann LeCun

In recent years, we’ve seen many adaptations of JEPA across modalities, time-series models, sparse methods and others. But what we lacked was the theoretical foundation – how to build JEPA properly, and what makes a JEPA good? 

Last week, we finally got it. Together with his former postdoc Randall Balestriero, Yann LeCun published what is likely one of the most important papers of the year: “LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics.” It finally provides the complete theory of JEPAs – and turns it into a verified practical method: LeJEPA.

You definitely need to know the key aspects behind JEPA behavior and the streamlined and scalable LeJEPA. And we’re here for you to explain them clearly and thoroughly.

In today’s episode, we will cover:

  • Refreshing the basics: How does JEPA work?

  • How JEPAs should behave

    • Isotropic Gaussian Embeddings

  • SIGReg: A unique regularization for JEPA

  • Implementation: What is LeJEPA?

  • Notable advantages and performance

  • Understanding the λ Trade-off

  • Not without limitations

  • What to Expect Next: Open-Source Momentum

  • Conclusion / Why LeJEPA matters?

  • Sources and further reading

Refreshing the basics: How does JEPA work?

AI has long tried to learn useful internal representations of the world that help models understand how things look, move, and change. Deep networks can now map raw data like images or signals into embeddings – compact vectors that hold meaning. The real challenge has been training those embeddings to capture the actual structure of the world rather than superficial patterns. And that’s exactly where JEPA is focused.

We explained JEPA and its connection to object-driven AI in our previous article in detail – if you want to dive deeper, we recommend reading it here. But today we’ll revisit the parts needed for general understanding and move on to LeJEPA.

So → JEPA’s (Joint-Embedding Predictive Architecture) main mission is to predict the representation, or embedding, of a missing or future part of the input. Basically it’s like doing a tiny bit of time travel – peeking one moment ahead and guessing the state of the world before it happens.

It is a self-supervised architecture where the model:

  • Takes two related inputs (for example, two video frames: x – a current frame, y – the next frame).

  • Encodes them into task-relevant, abstract embeddings/representations: sx and sy.

  • Learns to predict the representation of the future state from the current one, using predictor module.

Image Credit: Turing Post

As a result, the model is trained by making the embeddings of two related views of the same thing agree with each other. These “views” could be any type of data: a cropped version or a blurred version of an image, a different camera angle, a masked frame in a video, or paired data like image-caption or text-code.

Crucially, JEPA does not try to predict pixels or surface detailsit predicts exactly the state of the world represented abstractly. Instead of memorizing data, it learns how the world changes. As long as the two views share meaning, comparing them helps the model learn useful representations. Due to its workflow, JEPA supports object-centric understanding by modeling state transitions in abstract latent space.

This architecture also handles uncertainty, modeling the “unknown” parts of the next state in two ways:

  • During encoding by discarding noisy or ambiguous details.

  • After encoding by using latent variables (z) that represent multiple plausible future scenarios.

Overall, JEPA provides the core architectural principle for world modeling systems:

  1. Latent state representation → what the world is

  2. Predictive embedding → what the world will be

  3. Modularity → separate perception, prediction, and action

  4. Non-generative prediction → efficient modeling of long-term, structured and partially uncertain world dynamics.

Since 2022 different JEPA variant emerged, expanding JEPA to the many AI fields, like:

  • Multimodality: I-JEPA (image), V-JEPA and V-JEPA 2 (video), A-JEPA (audio-based), TI-JEPA (Text-Image), Text-JEPA, MC-JEPA (motion and static control).

  • Time series predictions, like TS-JEPA.

  • Combining JEPA with diffusion techniques: N-JEPA (Noise-based) and D-JEPA (Denoising JEPA).

  • Other types of data: 3D-JEPA, Point-JEPA (point clouds), T-JEPA (for tabular data), and even variants used in medical analysis – Signal-JEPA for EEG and ECG-JEPA.

But regardless of type, JEPA tends to cheat by giving nearly every input the same embedding which leads to collapse. This simply makes JEPA training fragile and overly complicated. Modern JEPA recipes try to prevent collapse with heuristics, such as: normalization layers, teacher–student networks, negative pairs, contrastive learning, asymmetric views + stop-gradient, complex schedules, hyperparameter tuning. However, they are quite complex and don’t guarantee the overall stability. Since collapse is a general problem of the architecture, it requires a fundamental solution.

Until now, JEPA designs have been reactive and full of heuristics. This changed with the newest member of the JEPA family – created by Yann LeCun himself: LeJEPA. It became possible because Yann LeCun and Randall Balestriero set out to rethink JEPA from the ground up, putting one key question at the center: What minimal principles should a good JEPA follow?

Before exploring LeJEPA, let’s continue with the basics – now at this updated, deeper level.

How JEPAs should behave

LeCun and Balestriero propose two simple “axioms” for JEPA:

  • Solve the prediction task, as it is the usual JEPA goal.

  • Make the embeddings follow an isotropic Gaussian distribution.

Image Credit: Normal (Gaussian) Distribution, Wikipedia

Image Credit: LeJEPA original paper

While the first part is well known to us, the second one is new. So why did they decide to use an isotropic Gaussian?

Isotropic Gaussian Embedding

Join Premium members from top companies like Microsoft, Nvidia, Google, Hugging Face, OpenAI, a16z, plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on with AI. Learn the basics and go deeper👆🏼

Reply

or to participate.