Quick answer: What is the Nemotron Coalition, and what is Nemotron 3?
The Nemotron Coalition is NVIDIA’s attempt to build open frontier AI models with a network of partner labs and product companies, including Mistral AI, Cursor, LangChain, Perplexity, Reflection AI, Sarvam AI, Black Forest Labs, and Thinking Machines Lab. Nemotron 3 is the technical foundation of that effort: an open-weight model family designed for agentic workloads, built with a hybrid Transformer + Mamba architecture, Mixture-of-Experts routing, multi-token prediction, and NVIDIA’s NVFP4 training stack. The bigger story is not just one model release, but NVIDIA trying to make open AI development happen on its compute, tooling, and ecosystem rails.
Subscribe for weekly operator-grade AI systems analysis:
https://www.turingpost.com/subscribe
What this article explains:
Why NVIDIA is open-sourcing Nemotron 3 and the broader Nemotron development stack
How Nemotron 3 works under the hood: Mamba, MoE, LatentMoE, multi-token prediction, and NVFP4
What the Nemotron Coalition actually is, who contributes what, and where the power sits
What this means for open frontier models, sovereign AI, and NVIDIA’s long-term role in the AI ecosystem
For years, the AI race has looked like a set of parallel sprints – each lab running fast, but mostly alone. But what if some influential AI labs with skilled developers and researchers try to not compete but assemble and build frontier models together?
It may sound unbelievable, but this just happened with the Nemotron Coalition. NVIDIA created a global collaboration of leading AI companies to develop Nemotron family of models, and they aligned that announcement with open sourcing Nemotron 3.
We hope everyone knows these names: Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam AI and Thinking Machines Lab. Together, they are pooling what usually stays locked inside: data, evaluation systems, research insights, and even compute. The goal is to create shared high-end foundation models that are stronger than anything any one of them could build alone, and then specialize them further. It is a common starting point, a kind of public infrastructure for AI, where progress compounds across the ecosystem. Anyone can take that foundation, adapt it, and build on top of it.
This topic is super interesting, because we’re witnessing something new not only on the tech stack side, but also in terms of developer collaboration. Today, we’re diving into what NVIDIA has actually built, how Nemotron 3 works, the real power dynamics behind the collaboration, and what all of this means for open AI’s future – things that you could have missed behind all other news from NVIDIA GTC.
Nemotron is not a model. It is our entire approach towards supporting an open ecosystem for artificial intelligence.
In today’s episode:
Why does NVIDIA build Nemotron 3 and make it open-source?
Nemotron 3 Under the Hood
Hybrid architecture: Transformer + Mamba
Mixture-of-Experts (MoE) and LatentMoE
Multi-token prediction
NVFP4 Precision: Acceleration is intelligence
Training stack
Summarizing design principles
The Nemotron Coalition: Who Builds What – and Who Holds the Power
What This Reveals
Conclusion
Sources and further reading
Why does NVIDIA build Nemotron 3 and make it open-source?
Nemotron 3 represents a very different approach from the closed-lab paradigm that has defined the last two years. First, NVIDIA is open-sourcing not just the model, but the entire development process: training data, including large-scale synthetic reasoning datasets; training recipes, including the pretraining and reinforcement learning setup; post-training pipelines; and tooling such as NeMo, NeMo RL, and NeMo Gym. Second, it serves as the starting point for joint projects with companies in the coalition, each contributing in its own domain. We’ll get into that later in the article, so stick with us.
Why is NVIDIA rewriting the rules of the game? There’s a clear rational incentive behind this.
Nemotron's first job is to make it possible for NVIDIA to continue to exist as a company.
NVIDIA builds accelerated computing – fast, specialized hardware. But to know what to accelerate, they need to deeply understand AI workloads from the inside. You can't just survey Meta or Google and ask "what should we build next?" because that information is too expensive to derive and too closely held by competitors. So NVIDIA trains its own frontier models to answer critical hardware design questions, such as: What precision levels actually matter? How does particular architecture influence chip design? What happens during training? Without building Nemotron, NVIDIA would be flying blind when designing the next generation hardware.
Another reason is that investing in open models is a compound bet on the expansion of the entire AI market. It is a long-term bet. Today, NVIDIA strongly emphasizes that it works with almost everyone: hyperscalers, tiny AI startups, legacy enterprise companies, countries, and governments worldwide. So every time AI scales in any direction, NVIDIA benefits.
But NVIDIA also benefits not only from the deployment of AI, but from the process of building it. And that is where it has found a distinctive role in the open ecosystem. Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, highlighted one of the most underreported facts in AI during his GTC talk:
It's usually less than a third of the compute that goes towards building AI that is actually building the model itself. About two thirds – or more, maybe three quarters – of the compute is spent on experiments and synthetic data generation, and things around the process of building a model.
This is why NVIDIA is releasing everything: recipes, datasets, ablation studies, RL rollouts, not just weights. The part of the process that most organizations keep secret is exactly where NVIDIA believes it can contribute most to the open ecosystem (and make it bigger).
Now let’s break down how Nemotron 3 is built to understand what makes its architecture and tech stack unique and worth a closer look.
Nemotron 3 Under the Hood
The latest Nemotron 3 is less about pushing raw model intelligence and more about resolving a systems-level bottleneck that has emerged with agentic AI. Together with multi-agent pipelines comes the problem of the latency and optimization of sustained reasoning over long, evolving contexts. In these systems, context scales nonlinearly. Each agent interaction reintroduces prior state, tool outputs, and intermediate reasoning, leading to sequences that are often an order of magnitude larger than standard conversational inputs. This inevitably leads to cost increase, slower inference, and introduces instability, where agents may drift from their original objective. At the same time, every step in such pipelines requires reasoning, creating a “thinking tax,” when large dense models are repeatedly invoked for small subtasks despite they don’t always require full model capacity. So on one side, we have the cost of maintaining a long, growing context, and on the other – the cost of repeated reasoning at every step.
Nemotron 3 can be seen as an architectural response to these two pressures. Technically, it introduces several design decisions that perfectly complement each other.
Hybrid architecture: Transformer + Mamba
Maybe the most interesting part is the hybridization of sequence modeling paradigms. Developers needed to add →
Don’t settle for shallow articles. Learn the basics and go deeper with us. Truly understanding things is deeply satisfying.
Join Premium members from top companies like Microsoft, NVIDIA, Google, Hugging Face, OpenAI, a16z, plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on in AI.


