- Turing Post
- Posts
- FOD#131: The State of Open Source AI: Do We Have a New King?
FOD#131: The State of Open Source AI: Do We Have a New King?
How, in 2025, open source AI evolved from philosophy into calculated strategy
LAST CALL: What do you expect from 2026? What it will be the year of? We’re piecing together expert views on the trajectory of ML and AI toward 2026. Send your boldest predictions to [email protected] or just reply to this email.
Many many thanks to those who already shared their views.
This Week in Turing Post:
Wednesday / AI 101 series: End of the year recap: Concepts and Methods you HAVE to know about
Friday / AI Interview: Shawn Shen, co-founder @Memories AI
If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free →
What is AGI, really? We are so eagerly building it. But do we know what it is? A super-coder that outperforms humans? An agent that learns basketball rules by watching for five minutes? Or something we're fundamentally not equipped to build without understanding consciousness itself? I asked founders, researchers, and security experts across the AI landscape – from OpenAI and Microsoft to Replit, Encord, Rubrik, Axiom Math, Predibase, Vectara, and others. Is there a consensus? → watch it here
Now, to the main topic:
The State of Open Source AI: Do We Have a New King?
When NVIDIA announced Nemotron 3 today – it marked a symbolic turning point in a year that fundamentally reshaped open-source AI leadership. Is NVIDIA the new open-source king? What’s their thinking behind this strategy? We will talk about it below.
First, a couple of words about this release: What makes it special? NVIDIA releases 3 trillion tokens of new pretraining, post-training and reinforcement learning dataset, along with open-source RL environments and libraries - making NVIDIA, according to the press-release, “the first to provide a full suite of models, data and tools for building highly accurate, efficient and specialized AI agents.” It was received very well:
How incredibly things has changed in just one year! Twelve months ago, Meta’s Llama dominated open-source AI discourse. The model family had become synonymous with accessible, high-quality AI development, democratizing capabilities once locked behind proprietary walls. The industry held its breath for April’s release of Meta’s Llama 4, expecting a true open-source contender to challenge the reasoning capabilities demonstrated by DeepSeek.
Instead, they got controversy.
Llama 4 Scout and Maverick arrived in April 2025 with significant architectural changes – mixture-of-experts design, multimodal capabilities, and a remarkable 10-million-token context window. Yet the reception was anything but triumphant. Early benchmarks showed the models struggling with long-context tasks despite their theoretical capabilities. Worse, Meta faced accusations of benchmark manipulation, with claims that test sets had contaminated training data. Less wrong called it “the most negative reaction I have seen to a model release.”
Most critically, Llama 4 failed to match DeepSeek’s reasoning paradigm. While DeepSeek R1 had introduced extended chain-of-thought reasoning that matched OpenAI’s o1, Llama 4 remained a traditional model without native “thinking” capabilities. Benchmarks consistently showed DeepSeek’s superiority in mathematical reasoning and coding tasks – precisely the domains where open-source models needed to compete.
The disappointment ran so deep that by May 2025, Meta delayed the release of Llama 4 Behemoth, its flagship 2-trillion-parameter model, citing concerns about insufficient performance improvements. In June, reports emerged that Zuckerberg personally intervened, handpicking a new AGI team after his “deep disappointment” with Llama 4’s market reception. By July, Meta was discussing abandoning Behemoth entirely and reconsidering its open-source strategy altogether.
The “Llama era” of reliable, incrementally improving open models had ended, leaving developers uncertain about Meta’s commitment to a space it once dominated and about US leadership in open source more broadly.
Into the Vacuum: China’s Strategic Opening
Sensing the perfect moment, Chinese labs accelerated. DeepSeek’s R1, released January 20, 2025, achieved performance comparable to OpenAI’s proprietary o1 on reasoning tasks at reportedly a fraction of the training cost. That release shocked everyone. But the breakthrough was not only technical – DeepSeek fully open-sourced the model weights, training methodology, and even published the research in Nature, demonstrating that cutting-edge reasoning capabilities could be democratized.
Alibaba’s Qwen family capitalized on this momentum. By the end of 2025, “Qwen has overtaken Llama in terms of total downloads and as the most-used base model to fine-tune”, with over 200 model variants spanning coding, multilingual tasks, and specialized domains. TIME recognized Alibaba on its 100 Most Influential Companies list specifically for its open-source AI leadership – a designation Meta might have expected to claim just a year earlier.
Chinese firms quickly recognized open source as a path to global ecosystem adoption. Developers who couldn’t access the latest GPT models due to geographic restrictions or cost constraints flocked to Qwen and DeepSeek.
NVIDIA’s Calculated Intervention
NVIDIA’s Nemotron 3 announcement represents a fundamentally different open-source philosophy than Meta’s earlier approach. Where Meta positioned Llama to drive ecosystem adoption and potentially challenge proprietary rivals, NVIDIA releases models to expand compute consumption on its hardware.
The distinction is critical. Nemotron 3’s “hybrid Mamba-Transformer” and “latent mixture-of-experts” architecture delivers 4x throughput improvements – but these gains are tightly optimized for NVIDIA silicon. The company contributed 3 trillion training tokens, 18 million post-training samples, and the first open reinforcement learning environment (NeMo Gym) to the community. But unlike Meta’s more general-purpose approach, every optimization in Nemotron pushes developers toward NVIDIA’s infrastructure. There is no judgement here, just facing the facts and business objectives. And you can still experiment locally, but the performance story only fully emerges on NVIDIA silicon. As a support to everything said above, this news also arrived today →
What 2025 Revealed
The year exposed three uncomfortable truths about open-source AI:
First, leadership requires sustained commitment. Meta’s wavering – from Llama 4’s troubled launch to Behemoth’s indefinite delay to internal debates about abandoning open source entirely – created an opening that Chinese rivals eagerly filled. You cannot dominate open source part-time or as a hedge against proprietary competition.
Second, “open” now exists on a spectrum. DeepSeek’s full transparency (weights, methodology, research papers) represents one end. Llama 4’s “open weights” with restrictive licenses and withheld training details occupies the middle. NVIDIA’s highly optimized but hardware-dependent models show how openness can coexist with strategic control. The days of assuming “open source” meant one thing are over.
Third, motivation matters more than technology. Chinese labs open-source to gain global adoption blocked by geopolitical barriers to proprietary services. NVIDIA open-sources to expand compute demand on its hardware. Meta tried to open-source to maintain competitive relevance while hedging against regulatory pressure – and discovered that unclear motivation produces unclear results.
As 2025 closes, Hugging Face now hosts over 2.2 million models

Image Credit: AI World
And NVIDIA is the “top contributor” to Hugging Face with over 650 open models and over 250 datasets released on the platform.
The Questions That Now Matter
The era of “Big Tech Charity” (Meta) is over. We are entering the era of “Hardware-Defined AI” (NVIDIA). Enterprises must decide if they are willing to marry NVIDIA’s infrastructure to get the best software, or fight a fragmentation battle with aging models.
This shift reframes open source from ideology into architecture. Openness no longer guarantees neutrality, longevity, or independence. It signals intent. NVIDIA opens to scale demand for compute. Chinese labs open to bypass geopolitical bottlenecks and win global mindshare. Meta opened without a stable endgame, and paid the price in trust.
The hard questions now sit with builders and buyers. Who controls the optimization path? Who absorbs dependency risk five years out? And when performance, tooling, and data pipelines converge around a single hardware stack, how “open” is the resulting ecosystem in practice?
In 2026, not able to predict anything, we just hope that Hugging Face keeps thriving by standing for open-source principles, across the board.
Curated – Coding with AI
Follow us on 🎥 YouTube Twitter Hugging Face 🤗
News from the usual suspects
Talking about openness: Anthropic hands over the keys to the agents
Anthropic has donated the Model Context Protocol (MCP) to the newly formed Agentic AI Foundation (AAIF), under the Linux Foundation’s wing. Co-founded with OpenAI, Block, and backed by AWS, Google, Microsoft and others, AAIF aims to steer agentic AI development as a neutral, open standard. With over 10,000 MCP servers live, this is less a handoff than a strategic cementing of AI’s connective tissue. Bravo.
Google DeepMind wants to talk (like an agent)
Google debuts the Interactions API, a new foundation for building agentic applications using Gemini 3 models and its new Gemini Deep Research agent. It goes beyond chat: background tasks, tool use, memory, and interleaved reasoning – all wrapped in a clean, composable schema. Google’s now betting that AI doesn’t just respond; it thinks, acts, and soon, decides. and → Google’s research agent goes pro
Now available through the new Interactions API, it’s Google’s sharpest agent yet – capable of deep web dives, report writing, and autonomous research across fields like finance and biotech. Alongside it comes DeepSearchQA, an open benchmark that dares agents to think like analysts, not parrots. Suddenly, your AI knows how to cite sources and impress your boss. and → Google Research introduces Urania
It is a differentially private framework that extracts insights from chatbot conversations without peeking at your personal data. Using DP clustering, DP keyword extraction, and LLM summarization without seeing raw chats, it outperforms baseline methods on privacy – and sometimes on quality too. Turns out, anonymized summaries might be sharper than the nosy ones. Privacy-preserving insights? Welcome to the future of responsible AI.
Axiom’s rabbit chase through Collatz chaos
Axiom's researchers sent transformers chasing the notorious Collatz conjecture – and caught something unexpected. In Learning Collatz, models were trained to predict “long Collatz steps,” revealing striking accuracy (up to 99.8%) when integers are encoded in certain bases. Rather than hallucinating, these models make structured, explainable errors. Instead of learning the full algorithm, they specialize in certain binary classes, revealing more about their minds than about the math. A study in failure worth celebrating.
OpenAI sees the suits lean in
OpenAI’s new State of Enterprise AI report paints a clear picture: AI is no longer a side hustle – it’s embedded. With 800M weekly users and structured workflows up 19×, enterprise adoption is deepening fast. From healthcare to finance, frontier firms are pulling ahead, not by using flashier models, but by integrating them better. The bottleneck is humans catching up to tech.
Microsoft [Lightning Strikes Twice]
Microsoft Research Asia presents Agent Lightning, an open-source framework that lets developers plug reinforcement learning (RL) into AI agents – without rewriting a single line of core code. It separates execution from training, making agents smarter through experience. Bonus: it plays nicely with existing RL algorithms. The era of self-improving agents just got a serious upgrade.
Benchmark highlight – FACTS Benchmark Suite
Research this week
(as always, 🌟 indicates papers that we recommend to pay attention to)
Train, align, and speed up reasoning LMs
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification – Verify long reasoning chains by summarizing outcomes, then checking the process efficiently with active learning and RL-based improvement →read the paper
🌟On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (by CMU) – Isolate what pre-training, mid-training, and RL each contribute to reasoning, then map when RL produces real capability gains versus cosmetic improvements →read the paper
🌟 Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning (by NLC Lab) – Train models to branch and execute reasoning in genuine parallel graphs, then harvest speedups without falling back to sequential decoding →read the paper
🌟 Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving (by Shanghai AI Lab) – Extend reasoning beyond single-context limits by storing lemmas in compact memory and iterating multi-round reasoning, summary, and verification →read the paper
Make agent systems reliable, debuggable, and scalable
🌟 DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems – Debug multi-agent failures by testing targeted interventions that actually flip outcomes, rather than “guessing the bug” from logs →read the paper
🌟 Towards a Science of Scaling Agent Systems (by Google DeepMind) – Derive empirical scaling rules for multi-agent coordination by measuring overhead, error amplification, redundancy, and task-tool trade-offs across many controlled configurations →read the paper
SWE-Exp – Accumulate and reuse repair expertise to improve software-engineering agent success rates, shifting debugging from trial-and-error to experience-driven fixes →read the paper
Improve architectures for efficiency and long context
🌟 From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs (by Peking University & Huawei Technologies) – Adapt autoregressive checkpoints into block-diffusion LMs via a principled path that keeps train–inference consistency while enabling parallel generation →read the paper
Learning Unmasking Policies for Diffusion Language Models – Learn which tokens to unmask at each diffusion step using RL so quality–throughput trade-offs stop relying on brittle heuristics →read the paper
🌟 Scaling Behavior of Discrete Diffusion Language Models (by ETH Zurich) – Characterize how discrete diffusion LMs scale under different noise types and identify regimes where they become compute- or data-efficient →read the paper
Sliding Window Attention Adaptation – Adapt full-attention LLMs to sliding-window attention with practical recipes that reduce long-context cost without trashing performance →read the paper
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs – Recover discarded phase information in RoPE-style attention scoring to preserve positional detail as context length grows →read the paper
Stronger Normalization-Free Transformers – Replace explicit normalization with a better pointwise stabilizer to improve training stability and generalization across domains →read the paper
Decode and explain brain representations
🌟 BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain (by MIT) – Discover voxel-level patterns from fMRI with unsupervised decomposition, then explain them by retrieving eliciting images and generating validated natural-language concept descriptions →read the paper
Rethink evaluation and verification of LLM systems
🌟 Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems – Calibrate LLM-as-judge scores, stabilize reweighting, and propagate uncertainty so rankings and confidence intervals stop lying to you →read the paper
Simulate and evaluate robot policies with generative world models
🌟 Evaluating Gemini Robotics Policies in a Veo World Simulator (by Google DeepMind) – Use a video world model to generate controlled scene variations for robot-policy evaluation, then probe OOD generalization and safety via scalable red-teaming →read the paper
Build expert-grade math and geometry agents
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning – Solve IMO geometry by proposing constructions, verifying with a symbolic engine, and escalating training difficulty via complexity-boosted RL →read the paper
Align role-playing and persona agents
MOA: Multi-Objective Alignment for Role-Playing Agents – Optimize role-playing along multiple rubrics at once using multi-objective RL and thought-augmented rollouts to balance style, knowledge, and instruction-following →read the paper
That’s all for today. Thank you for reading! It’s shorter today due to Columbus Day. Hope you can also have some time off. Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.
How did you like it? |


Reply