Ioannis Antonoglou helped build AlphaGo, AlphaZero, and MuZero at DeepMind. Now heβs CTO and co-founder of Reflection AI, betting that frontier models should be open weights, not a black box behind an API.
Quick answer: What is Reflection AI?
Reflection AI is a frontier AI lab building open-weight, reinforcement-learning-driven agent models. The company is focused on creating a general autonomous model (including coding and tool use), while keeping model weights open so researchers, enterprises, and governments can run and customize systems with full control over their AI stack. Its core thesis is that open science plus RL-based post-training can accelerate capability, safety, and adoption.
Is Reflection AI building only a coding model?
No. Antonoglou says the target is a general agent model, with coding as a major but not exclusive capability.
Why does Antonoglou push open-weight models?
He argues open models increase research velocity, external validation, and safety through broader community testing.
What is his practical definition of AGI?
An agent that can use software on a computer and perform tasks at human-level across workflows.
Subscribe for weekly operator-grade AI systems analysis:
https://www.turingpost.com/subscribe
In Part 1, we talk about openness as an actual strategy: why open models can move faster, why βsovereigntyβ matters for enterprises and governments, and why safety might improve when the ecosystem can stress-test the system instead of guessing.
We also get into the uncomfortable part: capable open agents can misbehave in public, fast (OpenClaw is the recent reminder). Is that a reason to close everything up, or a reason to make the risks visible and fixable?
In the Part 2 (video with Part 2 will be published next week, subscribe to our YouTube here) explains what they are building: a frontier open-weight βgeneral agent modelβ trained end-to-end with pretraining plus reinforcement learning.
And Iβll be honest: I left this conversation more skeptical than I expected. They raised $2 billion last year. But where the results?
Reflectionβs thesis is huge β build the missing Western open base model, then use RL to push it to the frontier. The problem is that this is also the slowest path in the game. βAll hands on deck building the modelβ means no clear wedge product yet, few concrete proof points, and a lot of execution risk while closed labs keep shipping.
Am I missing something? Read the interview and leave your opinion in the comments
Subscribe to our YouTube channel, or listen the interview on Spotify / Apple
We prepared a transcript for your convenience. But as always β watch the full video, subscribe, like and leave your feedback. It helps us grow on YouTube and bring you more insights
Ksenia Se:
Hello, everyone. Today I have an amazing guest, Ioannis Antonoglou. He helped build AlphaGo and later worked on systems like AlphaZero and MuZero at DeepMind. Now heβs CTO and co-founder of Reflection AI, where the team is applying reinforcement learning and large language models to autonomous coding β and all of it is supposed to be open-sourced. Welcome, Ioannis.
Ioannis Antonoglou:
Thank you so much for the invitation. Itβs good to be here.
Ksenia:
Itβs my pleasure. You went from DeepMindβs closed-world agents to building an open-weights lab. What changed your mind about where progress comes from?
From DeepMind to Reflection AI
Ioannis:
Thatβs a really good question. I joined DeepMind very early β I was one of the founding engineers, joining in 2012 when it was a small team of about 20 to 25 people. Back then, there was really no other place in the world where people were thinking seriously about AGI. I joined because I genuinely believed in the mission.
I spent the next ten years doing deep reinforcement learning research. I worked on DQN, which was the first deep RL agent to exist, then AlphaGo, AlphaZero, MuZero, and before I left, I was leading RLHF for Gemini. For most of my time at DeepMind, we were big proponents of publishing. All of our work on DQN, AlphaGo, AlphaZero, MuZero β everything was published. We shared all of it.
It was only after ChatGPT, and the competition between labs that followed, that research labs stopped publishing. Thereβs like nothing out there now. So at Reflection, we believe we are actually the only frontier lab truly committed to open science. We want to make our models open-weight because we genuinely believe that scientific progress and research velocity come from being open and sharing your findings with the rest of the research community.
Ksenia:
Thereβs Allen Institute for AI, but theyβre not commercial β not a business in that sense. Youβre different. When you were pitching to investors, it was well before the DeepSeek moment. What did you tell them? How did you make them believe in Reflection?
Ioannis:
I feel like people believed in Reflection because we had two things that were equally important to us. One is that reinforcement learning is the set of methods that will unlock the next set of capabilities. And the second is that open models are the future because they allow sovereignty β by which I mean anyone who wants to have absolute control over their AI to cover their AI needs.
We had a background of being genuine experts in reinforcement learning β both myself, my co-founder, and the team we brought together. And at the same time, everyone had started to recognize, even before DeepSeek, that with something like Llama 3 β a really powerful open model β people understood the power and importance of open-weight models in the ecosystem. They had started to see how you can have a valuable commercial engine that is actually based on open models.
What Reinforcement Learning Has Unlocked
Ksenia:
Has anything changed since you started Reflection AI? And what has reinforcement learning unlocked since then β and what are you still looking to unlock?
Ioannis:
Many things have actually happened since we started. For one, there are now many frontier open models coming out of Chinese labs, and many of them are quite successful commercially. So our vision of being both open and having a commercial engine that lets you stay at the frontier has materialized in China β there are real proof points of that.
At the same time, weβve seen the rise of reasoners β models trained with reinforcement learning on reasoning tasks. Most frontier models now have reinforcement learning as a big component of their training stack. We also see agents trained with RL for coding, for tool use across many different domains. This is how you ensure models can be extremely competent with tools, with coding, with agentic reasoning.
Across both of our big bets β reinforcement learning and open weight β weβve been vindicated by labs in China and by how research has played out over the past year and a half.
From AlphaGo to Now
Ksenia:
If we go back to AlphaGo in 2016 β that moment felt like a miracle to the outside world. For you as an engineer working day and night in the trenches, what is the biggest change in how progress actually happens now?
Ioannis:
Iβd say the fundamentals have largely stayed the same. When we were building AlphaGo, it was an extremely challenging engineering project. It required scaling our models and training recipes to massive runs β back in 2016, most people had a couple of GPUs and were just training things locally. Now itβs the norm that anyone training big models uses hundreds or thousands of GPUs. In that sense, we were a bit ahead of the curve.
At the same time, AlphaGo was a deeply collaborative project. Back then it was more common for a small group of researchers to just build something together, rather than large teams with project managers and deliverables β which is more the case now.
In many ways, AlphaGo and how we worked back then is like a preview of how things work today. We even had different training phases: first training on human data β equivalent to pre-training β and then reinforcement learning, which is what we now call post-training. And we had human testers telling us what mistakes the model was making, which is again quite similar to how things are done now.
So Iβd say that many of the ways we structured ourselves and went about our research is actually more similar to how things are done now than how research used to happen back in 2016.
Are We Still in the Era of Breakthroughs?
Ksenia:
Are we still in the era of breakthrough moments, or is it mostly messy operational work now?
Ioannis:
Itβs a good question and it really comes down to what you consider a breakthrough. Different people consider different levels of discovery as a breakthrough. I think we actually have most of the ingredients to build really powerful agents that can do almost anything a human can do on a computer. And thatβs a form of AGI.
In that sense, itβs more a matter of executing β finding the right methods, finding how everything fits together, and doing a lot of engineering and research. For some other things, we might need genuine breakthroughs: different architectures, different learning algorithms. It really comes down to what you want to build and what your definition of AGI is. Different labs and different people have different definitions of what might still be missing.
Ksenia:
What is your definition of AGI β and superintelligence, for that matter?
Ioannis:
For me, itβs something quite concrete. Itβs literally an agent that interacts with software on a computer and can really do what a human can do. In a way, itβs like a productivity tool that allows people to do many more things with their computers. And in that sense, I donβt think we need massive breakthroughs. We just need better engineering, better methods, better combinations of existing methods β but not anything thatβs really a game changer from a fundamental science perspective.
What Openness Changes Most
Ksenia:
When you think about openness and AGI, what does openness change the most? Whatβs the most important thing that openness makes happen?
Ioannis:
I feel like there are two outputs, but really one underlying cause. The main thing is that the only way for scientific progress to accelerate and be validated is through a community of researchers who work together, share ideas, and test each otherβs ideas. Thatβs the whole idea behind peer-reviewed science. The only way to achieve that is by actually sharing the output of your work β and sharing your models β so that other people can build on top of them, improve them, test them, validate them, and find the blind spots.
By being open, what you achieve is more ideas from the ecosystem, more input from the research community, and safer models. At the end of the day, people find the blind spots β they try to come up with methods to make the models safer. You have contributors around the world working with you. You see this with open-source software too. Open-source software tends to be safer because more eyes are on the code and more people are testing it.
Ksenia:
Sure, but AI has much more capability. And if we take OpenClaw as a use case of being open while also not being entirely safe β how do you look at that?
Ioannis:
I actually think OpenClaw was a good thing. It really showed us that you need to be extremely careful with these models β that theyβre extremely capable, and that if you give them access to your computer, they can do things you didnβt expect, even if youβre a world expert. Thatβs actually an example of being open working correctly. Many people inside big labs already knew that these systems require careful handling. But many people outside those labs didnβt know that because no one was sharing this information β it wasnβt obvious.
OpenClaw showcased that. And now many people are looking into how to make these systems safer. The research community is finding ways to address these problems. Itβs raised important questions that need to be addressed. Itβs started a conversation and ignited a debate β and only good things will come out of it.
A Year of Coding Acceleration
Ksenia:
Iβd love to touch on the coding acceleration topic because 2025 literally became the year of coding. I want to read something from March 2025, when Lightspeed co-led your Series A. It said: βReflection AI is leveraging its deep expertise in reinforcement learning and large language models to solve autonomous coding and, more broadly, unlock the path to superintelligence.β Itβs almost a year since then. What did the last twelve months force you to rewrite in terms of how you thought about autonomous coding?
Ioannis:
When we started, we really believed that reinforcement learning is the next frontier β and we just wanted to focus on that. But if you want to do reinforcement learning at the frontier, you need an extremely powerful base model to post-train. And you need a model you can then deploy safely.
What we saw β especially with Llama 4 not being a particularly strong model β was that the whole Western ecosystem was missing a powerful open base model that we could even use to do reinforcement learning at scale. We realized this was a gap we were the only ones positioned to fill. So we set out to do that. This is why Reflection is now building its frontier models from the ground up β doing both pre-training and reinforcement learning.
Ksenia:
How is it coming along?
Ioannis:
There are lots of incredible things weβll have to share this year.
Building the Agent: From Asimov to General Models
Ksenia:
Letβs talk about some of them. You started with Asimov as an autonomous agent for enterprise. What changed there, given everything thatβs launched β Claude just celebrating one year, OpenAI refocusing on coding, Codex performing well? What is your agent now? What are you working on?
Ioannis:
Weβre building frontier open-weight models β frontier agentic models. You can think of it as the Western equivalent of DeepSeek, or like Claude or GPT-5.2 β that level of capability, but open. Thatβs the focus of the company.
On coding specifically, I still believe that one of the main limitations of coding agents is that they donβt have access to all the context that your engineers have. That remains true. At the same time, weβve embarked on an extremely ambitious goal of building these models not just with reinforcement learning but from scratch β both pre-training and post-training. Weβve decided to focus fully on that.
Ksenia:
What are the main bottlenecks you come across?
Ioannis:
If you want to build frontier models, you first need to attract the right people. Weβve assembled a world-class team β people from Gemini, OpenAI, Meta, Apple, all the best labs and data providers. These are people who have done it before. Theyβve built frontier models and joined us because of our mission of building open intelligence. We are the only place genuinely committed to that, and that resonates with many of our scientists.
Then there are the resources β the compute you need to build these models. That required a significant raise: our Series B in October, where we raised more than two billion dollars to access the compute we need.
And then thereβs just the engineering. Building this model β even if you know what to do β is like building a rocket. There are many ways things can go wrong. You need your best rocket scientists to come in and have the room and the agency to actually do it. Weβre extremely lucky to have assembled this team.
Ksenia:
Youβre currently working on the model. Is there an application you plan to build on top of it?
Ioannis:
The focus right now is just to build the models. Applications will follow, but itβs all hands on deck. Building this model is challenging and requires all of our mental focus.
Open Models at the Frontier
Ksenia:
When we think about open-weight models, theyβre rarely up to speed with closed-lab models. Does the gap concern you? Do you think youβll be able to match the closed labs?
Ioannis:
The main reason people havenβt managed to build frontier open models is that they havenβt found a commercial engine to support them. Being open doesnβt actually slow you down β if anything, it accelerates you. For us, the important thing is to make our first models the best open models out there. Thatβs the focus.
At the same time, we believe weβll have everything else in place so that we can close the gap over time and eventually become the lab that builds the best models in the world. Thatβs the ambition, and thatβs the trajectory weβre on.
Ksenia:
How do you plan to evaluate that? Whatβs the actual metric you optimize for?
Ioannis:
The main metric that matters is adoption. Thatβs the only metric that truly matters β you put it out there and people like it and use it. Internally we have many evaluations. We make sure we donβt just follow benchmarks but have unseen evaluations that genuinely match real-world use cases. But once youβve built the models, you need to work closely with the people who use them and make sure they like them more and use them more.
Ksenia:
Are you still focusing on autonomous coding, or is this more of a general model?
Ioannis:
Itβs a general agent model.
Ksenia:
Why did you change direction?
Ioannis:
Because we think itβs important that there is a general open model in the West β and there isnβt one. So we have to build it.
Ksenia:
Interesting. I recently talked to Minimax and they said they also work on a general model, but coding gives much more immediate feedback β itβs just their focus.
Ioannis:
Coding is of course an extremely important vertical for us too. But the model is not just a coding model β itβs a general model. Agentic capabilities in many ways mean coding, tool use, agentic reasoning. Those all fit within coding. So yes, itβs a general model, and definitely coding is a big part of it.
Research Directions in Reinforcement Learning
Ksenia:
What parts of reinforcement learning research are most interesting to you right now?
Ioannis:
Some I can share, some I canβt yet. But the most important question is how you scale. Specifically, how do you create the credit assignment correctly β how do you scale the length of the trajectory so that the agent can take many, many steps and still learn effectively from that?
Ksenia:
Are there other research directions youβre combining with reinforcement learning?
Ioannis:
Weβre looking into many things across the whole stack. Reinforcement learning can mean many things β it also means the use of synthetic data, and ensuring that pre-training is done in a way that maximizes downstream reinforcement learning. The benefit of owning the whole stack is that you can optimize end to end. You can ensure your pre-training is RL-aware, and that the data mixtures, the synthetic data, everything you do is fully optimized together. We have research efforts and projects across the entire stack.
Ksenia:
Everything happens so fast in AI. How do you keep up?
Ioannis:
Many things happen fast, and Iβm extremely excited about that. But itβs also important to be able to tell real progress from noise β from things people just share for attention. Iβve been in AI since 2011, so fifteen years now. Iβve seen all the different waves of progress that have happened in that time. That makes me extremely bullish on what we can achieve in the next five years β because Iβve seen it happen. At the same time, it keeps me focused. I understand which things matter, and I know to just do the work and continue to sprint.
Ksenia:
I remember being so impressed that DeepMind was basically the first to seriously use the term AGI. A lot has changed since then, given the acceleration of everything. What are you concerned about as a builder of these powerful systems?
Ioannis:
The main concern is that weβre moving more and more into a world where this technology is not accessible to most people. Thereβs a significant concentration of power in the hands of the closed labs, and a real mismatch between whatβs happening inside those labs and whatβs happening in the rest of the research community β both in the US and around the world.
I feel almost a moral obligation at Reflection to build these models so that the rest of the research community can participate and so this technology becomes more democratized. Itβs worrisome that the concentration of power is as significant as it is.
Ksenia:
What worries me a little about open-source and open-weight models is something that came up in my recent conversation with Nathan Lambert. Heβs very bullish on open source β heβs pushing open models further β but at the same time, he doesnβt use open-source models himself. That disconnection between importance and actual usage concerns me.
Ioannis:
Yeah, I mean, people donβt use something thatβs not as powerful. Why would they? People just want to use the best models out there. So you want to ensure that genuinely competitive models β models actually close to the frontier β are open. Until we have that, youβll always see this mismatch between what people want to see and whatβs actually happening in reality.
The Value Proposition
Ksenia:
If we imagine you launched this model right now, what would be your value proposition to users?
Ioannis:
There are different users. For individuals, itβs having access to a frontier model. For research institutions and universities, itβs having access to a frontier model they can actually do research with β contributing meaningfully to things like safety, capabilities, and a better understanding of how these systems work.
For enterprises and governments, itβs the opportunity to fully own their AI stack and control their fate. An end-to-end system that runs on their own infrastructure, that they can customize, with full data privacy. There are many benefits to really owning your AI stack.
Advice for the Next Generation
Ksenia:
Given your experience β youβve been working on AGI for much longer than most people in the field β what would be your advice to young people just starting out, specifically researchers and machine learning engineers?
Ioannis:
Just do it. Thereβs a lot more work to be done, and itβs extremely exciting and really fun. Still, a lot more work to be done.
Ksenia:
Can you name some of that work?
Ioannis:
There are two paths, and I feel like both are valid β especially for younger people who can afford to take more risk. I joined DeepMind as a small startup right out of college because I was young and had high risk tolerance. I always tell people to try joining a startup rather than going straight to a big company after graduating. Thatβs how you learn, thatβs how you grow, and itβs really fun.
This can also apply to more exploratory research. If they want to do something that deviates from the dominant paradigm β if they care more about robotics or world models β they should definitely go for it.
At the same time, even within the dominant paradigm of foundational models, thereβs a lot of work still to be done. How do we use our data better? How do we scale our methods better? How do we do credit assignment better? There are still many unsolved problems, and we need bright young people to work on them with us.
What Hooked You at DeepMind
Ksenia:
When you joined DeepMind, what did Demis Hassabis and Shane Legg say to you that hooked you?
Ioannis:
Shane told me, βWeβre building AGI.β I was like, whatβs AGI? And he told me: we want to build computers that can think and do things like humans. I was like, this is insane. Letβs do it. Where do I sign?
I really appreciated the ambition, but also the fact that they were genuinely mission-first. They were doing it because they truly believed in it. Back then, AI had no money. You had to be a little crazy to even try. The fact that they were crazy enough, ambitious enough, and capable enough to kickstart this revolution β you could sense it. It was coming from a place of really believing in what they were building.
Ksenia:
Now thereβs much more money in AI. What do you tell people to hook them to Reflection?
Ioannis:
I tell them that if they believe in open science β if they believe itβs important to have frontier open models β and if they believe that AGI, this extraordinarily powerful technology, should be accessible to everyone and democratized, then thereβs only one place that does that in earnest. And thatβs Reflection.
A Book That Shaped You
Ksenia:
The last question, my usual one: whatβs a book that shaped you or seriously influenced you β from your childhood or recently?
Ioannis:
Thereβs a book I keep coming back to: The Idea Factory: Bell Labs and the Great Age of American Innovation. Itβs about the history of Bell Labs and how it shaped American innovation. It teaches you a lot about systems, companies, organizations, and how innovation actually happens. I highly recommend it.
Ksenia:
I love history. I always think we can still learn so much from it.
Ioannis:
Absolutely.
This interview has been edited and condensed for clarity.
Further reading
Defining AGI in practical terms: https://www.turingpost.com/p/fadel2
Open vs closed models tradeoffs: https://www.turingpost.com/p/openvsclosed
Nathan Lambert: Open Models Will Never Catch Up https://www.turingpost.com/p/nathanlambert
Inside a Chinese AI Lab: How MiniMax Builds Open Models https://www.turingpost.com/p/olive
A Fight Worth Having: The Case for Open Source AI https://www.turingpost.com/p/krikorian
Flow-GRPO and RL post-training: https://www.turingpost.com/p/gpro
The new AI software stack (agents, context, trust): https://www.turingpost.com/p/aisoftwarestack
