Ioannis Antonoglou helped build AlphaGo, AlphaZero, and MuZero at DeepMind. Now he’s CTO and co-founder of Reflection AI, betting that frontier models should be open weights, not a black box behind an API.

Quick answer: What is Reflection AI?

Reflection AI is a frontier AI lab building open-weight, reinforcement-learning-driven agent models. The company is focused on creating a general autonomous model (including coding and tool use), while keeping model weights open so researchers, enterprises, and governments can run and customize systems with full control over their AI stack. Its core thesis is that open science plus RL-based post-training can accelerate capability, safety, and adoption.

Is Reflection AI building only a coding model?

No. Antonoglou says the target is a general agent model, with coding as a major but not exclusive capability.

Why does Antonoglou push open-weight models?

He argues open models increase research velocity, external validation, and safety through broader community testing.

What is his practical definition of AGI?

An agent that can use software on a computer and perform tasks at human-level across workflows.

Subscribe for weekly operator-grade AI systems analysis:
https://www.turingpost.com/subscribe

In Part 1, we talk about openness as an actual strategy: why open models can move faster, why β€œsovereignty” matters for enterprises and governments, and why safety might improve when the ecosystem can stress-test the system instead of guessing.

We also get into the uncomfortable part: capable open agents can misbehave in public, fast (OpenClaw is the recent reminder). Is that a reason to close everything up, or a reason to make the risks visible and fixable?

In the Part 2 (video with Part 2 will be published next week, subscribe to our YouTube here) explains what they are building: a frontier open-weight β€œgeneral agent model” trained end-to-end with pretraining plus reinforcement learning.

And I’ll be honest: I left this conversation more skeptical than I expected. They raised $2 billion last year. But where the results?

Reflection’s thesis is huge – build the missing Western open base model, then use RL to push it to the frontier. The problem is that this is also the slowest path in the game. β€œAll hands on deck building the model” means no clear wedge product yet, few concrete proof points, and a lot of execution risk while closed labs keep shipping.

Am I missing something? Read the interview and leave your opinion in the comments

Subscribe to our YouTube channel, or listen the interview on Spotify / Apple

We prepared a transcript for your convenience. But as always – watch the full video, subscribe, like and leave your feedback. It helps us grow on YouTube and bring you more insights

Ksenia Se:
Hello, everyone. Today I have an amazing guest, Ioannis Antonoglou. He helped build AlphaGo and later worked on systems like AlphaZero and MuZero at DeepMind. Now he’s CTO and co-founder of Reflection AI, where the team is applying reinforcement learning and large language models to autonomous coding – and all of it is supposed to be open-sourced. Welcome, Ioannis.

Ioannis Antonoglou:
Thank you so much for the invitation. It’s good to be here.

Ksenia:
It’s my pleasure. You went from DeepMind’s closed-world agents to building an open-weights lab. What changed your mind about where progress comes from?

From DeepMind to Reflection AI

Ioannis:
That’s a really good question. I joined DeepMind very early – I was one of the founding engineers, joining in 2012 when it was a small team of about 20 to 25 people. Back then, there was really no other place in the world where people were thinking seriously about AGI. I joined because I genuinely believed in the mission.

I spent the next ten years doing deep reinforcement learning research. I worked on DQN, which was the first deep RL agent to exist, then AlphaGo, AlphaZero, MuZero, and before I left, I was leading RLHF for Gemini. For most of my time at DeepMind, we were big proponents of publishing. All of our work on DQN, AlphaGo, AlphaZero, MuZero – everything was published. We shared all of it.

It was only after ChatGPT, and the competition between labs that followed, that research labs stopped publishing. There’s like nothing out there now. So at Reflection, we believe we are actually the only frontier lab truly committed to open science. We want to make our models open-weight because we genuinely believe that scientific progress and research velocity come from being open and sharing your findings with the rest of the research community.

Ksenia:
There’s Allen Institute for AI, but they’re not commercial – not a business in that sense. You’re different. When you were pitching to investors, it was well before the DeepSeek moment. What did you tell them? How did you make them believe in Reflection?

Ioannis:
I feel like people believed in Reflection because we had two things that were equally important to us. One is that reinforcement learning is the set of methods that will unlock the next set of capabilities. And the second is that open models are the future because they allow sovereignty – by which I mean anyone who wants to have absolute control over their AI to cover their AI needs.

We had a background of being genuine experts in reinforcement learning – both myself, my co-founder, and the team we brought together. And at the same time, everyone had started to recognize, even before DeepSeek, that with something like Llama 3 – a really powerful open model – people understood the power and importance of open-weight models in the ecosystem. They had started to see how you can have a valuable commercial engine that is actually based on open models.

What Reinforcement Learning Has Unlocked

Ksenia:
Has anything changed since you started Reflection AI? And what has reinforcement learning unlocked since then – and what are you still looking to unlock?

Ioannis:
Many things have actually happened since we started. For one, there are now many frontier open models coming out of Chinese labs, and many of them are quite successful commercially. So our vision of being both open and having a commercial engine that lets you stay at the frontier has materialized in China – there are real proof points of that.

At the same time, we’ve seen the rise of reasoners – models trained with reinforcement learning on reasoning tasks. Most frontier models now have reinforcement learning as a big component of their training stack. We also see agents trained with RL for coding, for tool use across many different domains. This is how you ensure models can be extremely competent with tools, with coding, with agentic reasoning.

Across both of our big bets – reinforcement learning and open weight – we’ve been vindicated by labs in China and by how research has played out over the past year and a half.

From AlphaGo to Now

Ksenia:
If we go back to AlphaGo in 2016 – that moment felt like a miracle to the outside world. For you as an engineer working day and night in the trenches, what is the biggest change in how progress actually happens now?

Ioannis:
I’d say the fundamentals have largely stayed the same. When we were building AlphaGo, it was an extremely challenging engineering project. It required scaling our models and training recipes to massive runs – back in 2016, most people had a couple of GPUs and were just training things locally. Now it’s the norm that anyone training big models uses hundreds or thousands of GPUs. In that sense, we were a bit ahead of the curve.

At the same time, AlphaGo was a deeply collaborative project. Back then it was more common for a small group of researchers to just build something together, rather than large teams with project managers and deliverables – which is more the case now.

In many ways, AlphaGo and how we worked back then is like a preview of how things work today. We even had different training phases: first training on human data – equivalent to pre-training – and then reinforcement learning, which is what we now call post-training. And we had human testers telling us what mistakes the model was making, which is again quite similar to how things are done now.

So I’d say that many of the ways we structured ourselves and went about our research is actually more similar to how things are done now than how research used to happen back in 2016.

Are We Still in the Era of Breakthroughs?

Ksenia:
Are we still in the era of breakthrough moments, or is it mostly messy operational work now?

Ioannis:
It’s a good question and it really comes down to what you consider a breakthrough. Different people consider different levels of discovery as a breakthrough. I think we actually have most of the ingredients to build really powerful agents that can do almost anything a human can do on a computer. And that’s a form of AGI.

In that sense, it’s more a matter of executing – finding the right methods, finding how everything fits together, and doing a lot of engineering and research. For some other things, we might need genuine breakthroughs: different architectures, different learning algorithms. It really comes down to what you want to build and what your definition of AGI is. Different labs and different people have different definitions of what might still be missing.

Ksenia:
What is your definition of AGI – and superintelligence, for that matter?

Ioannis:
For me, it’s something quite concrete. It’s literally an agent that interacts with software on a computer and can really do what a human can do. In a way, it’s like a productivity tool that allows people to do many more things with their computers. And in that sense, I don’t think we need massive breakthroughs. We just need better engineering, better methods, better combinations of existing methods – but not anything that’s really a game changer from a fundamental science perspective.

What Openness Changes Most

Ksenia:
When you think about openness and AGI, what does openness change the most? What’s the most important thing that openness makes happen?

Ioannis:
I feel like there are two outputs, but really one underlying cause. The main thing is that the only way for scientific progress to accelerate and be validated is through a community of researchers who work together, share ideas, and test each other’s ideas. That’s the whole idea behind peer-reviewed science. The only way to achieve that is by actually sharing the output of your work – and sharing your models – so that other people can build on top of them, improve them, test them, validate them, and find the blind spots.

By being open, what you achieve is more ideas from the ecosystem, more input from the research community, and safer models. At the end of the day, people find the blind spots – they try to come up with methods to make the models safer. You have contributors around the world working with you. You see this with open-source software too. Open-source software tends to be safer because more eyes are on the code and more people are testing it.

Ksenia:
Sure, but AI has much more capability. And if we take OpenClaw as a use case of being open while also not being entirely safe – how do you look at that?

Ioannis:
I actually think OpenClaw was a good thing. It really showed us that you need to be extremely careful with these models – that they’re extremely capable, and that if you give them access to your computer, they can do things you didn’t expect, even if you’re a world expert. That’s actually an example of being open working correctly. Many people inside big labs already knew that these systems require careful handling. But many people outside those labs didn’t know that because no one was sharing this information – it wasn’t obvious.

OpenClaw showcased that. And now many people are looking into how to make these systems safer. The research community is finding ways to address these problems. It’s raised important questions that need to be addressed. It’s started a conversation and ignited a debate – and only good things will come out of it.

A Year of Coding Acceleration

Ksenia:
I’d love to touch on the coding acceleration topic because 2025 literally became the year of coding. I want to read something from March 2025, when Lightspeed co-led your Series A. It said: β€œReflection AI is leveraging its deep expertise in reinforcement learning and large language models to solve autonomous coding and, more broadly, unlock the path to superintelligence.” It’s almost a year since then. What did the last twelve months force you to rewrite in terms of how you thought about autonomous coding?

Ioannis:
When we started, we really believed that reinforcement learning is the next frontier – and we just wanted to focus on that. But if you want to do reinforcement learning at the frontier, you need an extremely powerful base model to post-train. And you need a model you can then deploy safely.

What we saw – especially with Llama 4 not being a particularly strong model – was that the whole Western ecosystem was missing a powerful open base model that we could even use to do reinforcement learning at scale. We realized this was a gap we were the only ones positioned to fill. So we set out to do that. This is why Reflection is now building its frontier models from the ground up – doing both pre-training and reinforcement learning.

Ksenia:
How is it coming along?

Ioannis:
There are lots of incredible things we’ll have to share this year.

Building the Agent: From Asimov to General Models

Ksenia:
Let’s talk about some of them. You started with Asimov as an autonomous agent for enterprise. What changed there, given everything that’s launched – Claude just celebrating one year, OpenAI refocusing on coding, Codex performing well? What is your agent now? What are you working on?

Ioannis:
We’re building frontier open-weight models – frontier agentic models. You can think of it as the Western equivalent of DeepSeek, or like Claude or GPT-5.2 – that level of capability, but open. That’s the focus of the company.

On coding specifically, I still believe that one of the main limitations of coding agents is that they don’t have access to all the context that your engineers have. That remains true. At the same time, we’ve embarked on an extremely ambitious goal of building these models not just with reinforcement learning but from scratch – both pre-training and post-training. We’ve decided to focus fully on that.

Ksenia:
What are the main bottlenecks you come across?

Ioannis:
If you want to build frontier models, you first need to attract the right people. We’ve assembled a world-class team – people from Gemini, OpenAI, Meta, Apple, all the best labs and data providers. These are people who have done it before. They’ve built frontier models and joined us because of our mission of building open intelligence. We are the only place genuinely committed to that, and that resonates with many of our scientists.

Then there are the resources – the compute you need to build these models. That required a significant raise: our Series B in October, where we raised more than two billion dollars to access the compute we need.

And then there’s just the engineering. Building this model – even if you know what to do – is like building a rocket. There are many ways things can go wrong. You need your best rocket scientists to come in and have the room and the agency to actually do it. We’re extremely lucky to have assembled this team.

Ksenia:
You’re currently working on the model. Is there an application you plan to build on top of it?

Ioannis:
The focus right now is just to build the models. Applications will follow, but it’s all hands on deck. Building this model is challenging and requires all of our mental focus.

Open Models at the Frontier

Ksenia:
When we think about open-weight models, they’re rarely up to speed with closed-lab models. Does the gap concern you? Do you think you’ll be able to match the closed labs?

Ioannis:
The main reason people haven’t managed to build frontier open models is that they haven’t found a commercial engine to support them. Being open doesn’t actually slow you down – if anything, it accelerates you. For us, the important thing is to make our first models the best open models out there. That’s the focus.

At the same time, we believe we’ll have everything else in place so that we can close the gap over time and eventually become the lab that builds the best models in the world. That’s the ambition, and that’s the trajectory we’re on.

Ksenia:
How do you plan to evaluate that? What’s the actual metric you optimize for?

Ioannis:
The main metric that matters is adoption. That’s the only metric that truly matters – you put it out there and people like it and use it. Internally we have many evaluations. We make sure we don’t just follow benchmarks but have unseen evaluations that genuinely match real-world use cases. But once you’ve built the models, you need to work closely with the people who use them and make sure they like them more and use them more.

Ksenia:
Are you still focusing on autonomous coding, or is this more of a general model?

Ioannis:
It’s a general agent model.

Ksenia:
Why did you change direction?

Ioannis:
Because we think it’s important that there is a general open model in the West – and there isn’t one. So we have to build it.

Ksenia:
Interesting. I recently talked to Minimax and they said they also work on a general model, but coding gives much more immediate feedback – it’s just their focus.

Ioannis:
Coding is of course an extremely important vertical for us too. But the model is not just a coding model – it’s a general model. Agentic capabilities in many ways mean coding, tool use, agentic reasoning. Those all fit within coding. So yes, it’s a general model, and definitely coding is a big part of it.

Research Directions in Reinforcement Learning

Ksenia:
What parts of reinforcement learning research are most interesting to you right now?

Ioannis:
Some I can share, some I can’t yet. But the most important question is how you scale. Specifically, how do you create the credit assignment correctly – how do you scale the length of the trajectory so that the agent can take many, many steps and still learn effectively from that?

Ksenia:
Are there other research directions you’re combining with reinforcement learning?

Ioannis:
We’re looking into many things across the whole stack. Reinforcement learning can mean many things – it also means the use of synthetic data, and ensuring that pre-training is done in a way that maximizes downstream reinforcement learning. The benefit of owning the whole stack is that you can optimize end to end. You can ensure your pre-training is RL-aware, and that the data mixtures, the synthetic data, everything you do is fully optimized together. We have research efforts and projects across the entire stack.

Ksenia:
Everything happens so fast in AI. How do you keep up?

Ioannis:
Many things happen fast, and I’m extremely excited about that. But it’s also important to be able to tell real progress from noise – from things people just share for attention. I’ve been in AI since 2011, so fifteen years now. I’ve seen all the different waves of progress that have happened in that time. That makes me extremely bullish on what we can achieve in the next five years – because I’ve seen it happen. At the same time, it keeps me focused. I understand which things matter, and I know to just do the work and continue to sprint.

Ksenia:
I remember being so impressed that DeepMind was basically the first to seriously use the term AGI. A lot has changed since then, given the acceleration of everything. What are you concerned about as a builder of these powerful systems?

Ioannis:
The main concern is that we’re moving more and more into a world where this technology is not accessible to most people. There’s a significant concentration of power in the hands of the closed labs, and a real mismatch between what’s happening inside those labs and what’s happening in the rest of the research community – both in the US and around the world.

I feel almost a moral obligation at Reflection to build these models so that the rest of the research community can participate and so this technology becomes more democratized. It’s worrisome that the concentration of power is as significant as it is.

Ksenia:
What worries me a little about open-source and open-weight models is something that came up in my recent conversation with Nathan Lambert. He’s very bullish on open source – he’s pushing open models further – but at the same time, he doesn’t use open-source models himself. That disconnection between importance and actual usage concerns me.

Ioannis:
Yeah, I mean, people don’t use something that’s not as powerful. Why would they? People just want to use the best models out there. So you want to ensure that genuinely competitive models – models actually close to the frontier – are open. Until we have that, you’ll always see this mismatch between what people want to see and what’s actually happening in reality.

The Value Proposition

Ksenia:
If we imagine you launched this model right now, what would be your value proposition to users?

Ioannis:
There are different users. For individuals, it’s having access to a frontier model. For research institutions and universities, it’s having access to a frontier model they can actually do research with – contributing meaningfully to things like safety, capabilities, and a better understanding of how these systems work.

For enterprises and governments, it’s the opportunity to fully own their AI stack and control their fate. An end-to-end system that runs on their own infrastructure, that they can customize, with full data privacy. There are many benefits to really owning your AI stack.

Advice for the Next Generation

Ksenia:
Given your experience – you’ve been working on AGI for much longer than most people in the field – what would be your advice to young people just starting out, specifically researchers and machine learning engineers?

Ioannis:
Just do it. There’s a lot more work to be done, and it’s extremely exciting and really fun. Still, a lot more work to be done.

Ksenia:
Can you name some of that work?

Ioannis:
There are two paths, and I feel like both are valid – especially for younger people who can afford to take more risk. I joined DeepMind as a small startup right out of college because I was young and had high risk tolerance. I always tell people to try joining a startup rather than going straight to a big company after graduating. That’s how you learn, that’s how you grow, and it’s really fun.

This can also apply to more exploratory research. If they want to do something that deviates from the dominant paradigm – if they care more about robotics or world models – they should definitely go for it.

At the same time, even within the dominant paradigm of foundational models, there’s a lot of work still to be done. How do we use our data better? How do we scale our methods better? How do we do credit assignment better? There are still many unsolved problems, and we need bright young people to work on them with us.

What Hooked You at DeepMind

Ksenia:
When you joined DeepMind, what did Demis Hassabis and Shane Legg say to you that hooked you?

Ioannis:
Shane told me, β€œWe’re building AGI.” I was like, what’s AGI? And he told me: we want to build computers that can think and do things like humans. I was like, this is insane. Let’s do it. Where do I sign?

I really appreciated the ambition, but also the fact that they were genuinely mission-first. They were doing it because they truly believed in it. Back then, AI had no money. You had to be a little crazy to even try. The fact that they were crazy enough, ambitious enough, and capable enough to kickstart this revolution – you could sense it. It was coming from a place of really believing in what they were building.

Ksenia:
Now there’s much more money in AI. What do you tell people to hook them to Reflection?

Ioannis:
I tell them that if they believe in open science – if they believe it’s important to have frontier open models – and if they believe that AGI, this extraordinarily powerful technology, should be accessible to everyone and democratized, then there’s only one place that does that in earnest. And that’s Reflection.

A Book That Shaped You

Ksenia:
The last question, my usual one: what’s a book that shaped you or seriously influenced you – from your childhood or recently?

Ioannis:
There’s a book I keep coming back to: The Idea Factory: Bell Labs and the Great Age of American Innovation. It’s about the history of Bell Labs and how it shaped American innovation. It teaches you a lot about systems, companies, organizations, and how innovation actually happens. I highly recommend it.

Ksenia:
I love history. I always think we can still learn so much from it.

Ioannis:
Absolutely.

This interview has been edited and condensed for clarity.

Further reading

Reply

Avatar

or to participate

Keep Reading