FOD#151: Recursive Self-Learning: Why It Matters Now

Share Turing Post with one person. You will help us grow

Main concept this week: Recursive self-learning is the shift from AI systems learning inside human-designed loops to systems helping build, test, and improve those loops. It matters now because AI R&D is mostly digital, making parts of research, evaluation, and successor-system training increasingly automatable.

This Week in Turing Post:

Wednesday / AI 101 series: Agentic Vector Database (it’s new and hot!)
Friday / The Org Age of AI: we continue our series

🔥 Which AI skills will actually matter in 2026?

That’s one of the most important questions! On May 14, I’m co-hosting AI Skills Conf – a practical online conference on the workflows, tools, and decisions professionals will actually need in 2026.

You should join me. The sessions I’d most recommend:

How to become irreplaceable with AI
The 2026 AI tool stack for founders and small-business owners
AI ROI reality check: which use cases are delivering business value?

As we know, the professionals who stay irreplaceable are not the ones who fear AI. They're the ones who learn how to use it before everyone else does.

The event is free and super practical. You should be there →

To the main topic → Everyone talks about Recursive Self-Learning

Loops are the basic unit of machine learning. A model predicts, gets feedback, updates.

An agent does a bit of the same: writes code, runs tests, edits, runs the tests again. A system catalogues its own failure, stores the lesson, and tries a different route next time.

For most of AI's history, there has been a constant outside that loop: a human – Human in the Loop, in the field's vocabulary. Now the human is the bottleneck.

Recursive self-learning (RSL) is a way to change it, and it’s already shifting that boundary.

In his recent tweet, Jack Clark, Anthropic co-founder and now Head of Public Benefit, wrote:

— # (#)

What is recursive self-learning, anyway?

The idea has been in the air lately even without the name. Andrej Karpathy's autoresearch is the cleanest small example. An agent is given a real LLM training script, edits the code, runs a fixed five-minute experiment, measures validation bits-per-byte, keeps the change if it improves the result, discards it if it does not, and repeats. What autoresearch removes from the loop is Karpathy – because Karpathy is the bottleneck. He still sets the metric, the budget, and the initial research program. He is no longer inside every iteration. He has moved up one level, from tuning the experiment to designing the loop that tunes it.

That is the useful way to think about recursive self-learning. It is not a model waking up and choosing to become a better model. It is a system beginning to automate parts of the process by which the system – or systems like it – get better: writing code, generating training data, running experiments, optimizing kernels, fine-tuning models, building evaluations, improving prompts, improving tools, and eventually helping train successor systems.

The history of recursive self-learning

The idea is older than the field. In 1950, Alan Turing proposed building a "child machine" and educating it, rather than programming adult intelligence directly. Arthur Samuel's checkers program in the late 1950s improved through self-play and showed that a machine could get better at a task without each improvement being hand-coded. I.J. Good made the strongest version of the argument in 1965: if designing better machines is itself an intellectual task, then a machine better than humans at intellectual tasks would design an even better one. Jürgen Schmidhuber gave the loop a formal expression in 2003 with the Gödel Machine – a system that rewrites its own code once it can prove the rewrite is an improvement. For over six decades, almost all of this remained theoretical.

The practical versions were narrow. AlphaGo Zero improved through self-play, but Go is a closed world: fixed rules, clean reward, no hidden state. AutoML, neural architecture search, self-distillation, and synthetic-data pipelines all added components – proof that machines could help improve machine-learning systems, always inside a frame a human had built.

What is changing now is that the loop is moving into AI R&D itself

AI research has an unusual property: most of the work is already digital. Code, data, training runs, evaluation scripts, benchmarks, logs, dashboards. The day-to-day is not lightning-strike insight; it is running variants, finding errors, improving throughput, testing ideas, comparing scores, and deciding what to try next. That makes it tractable for automation in a way that, for example, biology research is not.

This is the spine of Jack Clark's latest Import AI essay. His headline claim is a 60%+ probability that "no-human-involved AI R&D" – a system capable of training its own successor – arrives by the end of 2028. The argument is not one benchmark but the accumulation: SWE-Bench, METR time horizons, CORE-Bench, MLE-Bench, PostTrainBench, kernel optimization, automated alignment research, and AI systems managing other AI systems. The case is a mosaic of partial loops beginning to connect.

From the No Priors podcast, we learn that, according to Karpathy, the most interesting version of RSL is probably what frontier labs are already working on: experiment on smaller models, make the process as autonomous as possible, and remove researchers from as much of the execution loop as you can. Researchers can still contribute ideas, but they should not be manually enacting every idea. That changes a researchers job a lot.

The academic community has begun catching up to the framing as well. The ICLR 2026 workshop on recursive self-improvement describes the field as moving from speculative vision to a concrete systems problem: what changes, when it changes, how the change is produced, where the system operates, and how alignment, evaluation, and rollback should work. Recursive self-learning has gained some practical weight and is becoming a design problem with parameters.

There is even a month old startup with the name “Recursive Superintelligence” that just raised $500 million for self-learning AI. So, you know, it’s all serious.

I want to leave you with this

For decades, we built systems that learned inside loops. We are now building systems that may learn how to build the loops. And we will be learning alongside them: what "better" means once the system is also helping decide what "better" counts as.

There is another obligation here. When a system begins evolving autonomously, it needs rigorous, continuous verification and alignment, so its improvement loop remains anchored to human safety and well-being. Both are very hard problems, because we still do not really know how these machines "think."

If any of those thoughts resonate with you – share them across your social networks. Let’s keep the conversation going.

Topic 2: That used to be a great, impactful writing! And now it’s a sign of slop. How did that happen? featuring Apostle Paul. Watch →

Twitter Library

9 New Approaches to Multi-Agent Systems

AI systems become more complex and multi-layered than before, and now we need new rules and ideas to make them work the way we need. Here are some of the recent approaches with a fresh view on MAS:

Turing Post • Alyona Vert.

We are reading/watching/learning:

AI’s moats, myths and moral loopholes by Azeem Azhar
The Distillation panic by Nathan Lambert
On SFT, RL, and on-policy distillation by Will Brown

News from the usual suspects ™

Anthropic is building an enterprise deployment arm with Wall Street
Anthropic announced a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs; WSJ reports the venture is expected to total about $1.5B. This is Claude moving into the implementation layer, where the real enterprise money and pain live.
OpenAI is widening distribution beyond Microsoft with AWS
OpenAI says its models, Codex, and Managed Agents are coming to AWS so enterprises can use them inside existing AWS security and compliance environments. OpenAI’s enterprise strategy is no longer only “come to ChatGPT”; it is “we will show up where your stack already is.”
Google DeepMind is pushing medical AI into “co-clinician” research
DeepMind shared an AI co-clinician research initiative that tests evidence-grounded clinical reasoning and real-time multimodal telemedicine simulations. The careful wording matters: supportive tool under physician authority, not replacement doctor. Sensible, because “move fast and break patients” is a bad slogan.
Microsoft is turning agent governance into a product category
Agent 365 is now generally available, with Microsoft framing it as a control plane to observe, govern, and secure AI agents across delegated agents, autonomous agents, local agents, SaaS agents, and “shadow AI.” It explicitly mentions discovery for tools like OpenClaw, Claude Code, and GitHub Copilot CLI. This is very relevant for the organizational AI story: once agents spread, visibility becomes infrastructure.
The Pentagon AI story escalates
Reuters reported on Apr 28 that Google signed a classified AI agreement with the Pentagon for “any lawful government purpose,” including sensitive classified work, while retaining stated limits around domestic mass surveillance and autonomous weapons without human oversight. Then The Verge reported on May 1 that the Pentagon had struck classified AI deals with OpenAI, Google, Microsoft, Amazon, Nvidia, xAI, and Reflection, while excluding Anthropic after the dispute over its red lines. This is the governance story hiding inside the model race
Musk said xAI “partly” used OpenAI models to train Grok, in court testimony about model distillation (everyone does it!)

Models

GLM-5V-Turbo – Pushes multimodal perception into the core agent loop: reasoning, planning, tool use, execution, coding, GUI work, webpages, documents, images, and video. This is worth highlighting because it treats vision as part of agency rather than as a side input to a language model →read the paper
NVIDIA Nemotron 3 Nano Omni – Unifies vision, audio, video, documents, charts, graphical interfaces, and text input into one open omni-modal reasoning model for agentic workflows. The strong angle is efficiency: NVIDIA frames it as a perception sub-agent for computer use, document intelligence, and audio-video reasoning, with up to 9x higher throughput than other open omni models →read the release
Mistral Medium 3.5 – Consolidates instruction-following, reasoning, and coding into one opened-weight 128B dense model with a 256K context window. This is interesting as a product-direction signal: fewer specialized public models, more unified generalist weights for agents and coding workflows →see the model

Research

Trends we see looking at every paper related to AI and ML published last week:

Self-improvement is becoming the dominant design pattern.
Heterogeneous systems are also everywhere.
Efficiency is now part of intelligence research.

The bigger picture: AI research is moving toward systems that improve themselves, coordinate across heterogeneous components, and leave behind machine-readable traces of what they did and why it worked.

Self-improving agent systems and AI organizations

THE LAST HARNESS YOU’LL EVER BUILD – automates the painful work of harness engineering, then goes further by trying to learn how to improve the harness-improvement process itself →read the paper
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company – frames multi-agent systems as adaptive organizations, where agents can be recruited, reviewed, reconfigured, and improved through an explicit organizational layer →read the paper
Recursive Multi-Agent Systems – extends recursive reasoning from a single model to a whole multi-agent collaboration loop, making agent cooperation itself a scaling target →read the paper
Synthetic Computers at Scale for Long-Horizon Productivity Simulation – creates synthetic user worlds with files, folders, collaborators, and month-scale tasks, giving agents richer environments for learning realistic productivity work →read the paper
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction – turns web research into structured, table-like extraction through coordinated agents that search, verify, remember, and reconcile evidence →read the paper
Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization – pushes multi-agent optimization away from fixed handcrafted rules and toward agents that adapt both their local actions and cooperation patterns from trajectory history →read the paper

Training loops for capability consolidation and continual improvement

Co-Evolving Policy Distillation – proposes a way to merge expert capabilities while they are still training, reducing the gap between separate specialists and the final all-in-one model →read the paper
Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies – closes the loop between real robot deployment, human correction, fleet experience, and policy improvement, which makes it interesting for continual learning in the physical world →read the paper
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding – applies speculative decoding inside RL rollout generation, making post-training faster without changing the target model’s output distribution →read the paper
Heterogeneous Scientific Foundation Model Collaboration – connects language agents with specialized scientific foundation models, making scientific agent systems less dependent on language-only reasoning →read the paper

Machine-readable work, skills, and research artifacts

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills – converts messy text-based agent skills into a more explicit structure for scheduling, execution, side effects, and risk review →read the paper
The Last Human-Written Paper: Agent-Native Research Artifacts – argues that papers should become executable research packages, preserving code, evidence, failed paths, and exploration history for future agents →read the paper

Runtime control, efficiency, and practical deployment

Efficient Training on Multiple Consumer GPUs with RoundPipe – makes consumer-GPU fine-tuning more practical by treating GPUs as flexible workers rather than fixed holders of uneven model stages →read the paper
Step-level Optimization for Efficient Computer-use Agents – replaces always-on frontier-model inference with event-driven escalation, using cheaper policies for routine GUI steps and stronger models for risky moments →read the paper
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling – treats generation length as a token-level value signal, making length control, budgeted reasoning, and interpretability much more direct →read the paper
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models – studies whether models can follow instructed reasoning patterns even when those patterns conflict with task-appropriate reasoning, then shows a path toward activation-level steering →read the paper

That’s all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

FAQ block

What is recursive self-learning?
Recursive self-learning is when an AI system helps improve the process by which AI systems are built, trained, tested, or deployed.

How is it different from self-play?
Self-play improves performance inside a closed game or task. Recursive self-learning can act on the broader AI development pipeline: code, data, evaluations, tools, experiments, and training loops.

Why does recursive self-learning matter now?
Because AI research is unusually digital. Much of the work already happens in code, logs, benchmarks, dashboards, and training scripts, which makes it easier to automate than many other forms of research.

Is recursive self-learning the same as recursive self-improvement?
Not exactly. Recursive self-improvement is the stronger, more speculative idea of systems directly improving their own capabilities. Recursive self-learning is the practical, nearer-term version: systems improving the surrounding loops.

What is the main risk?
The risk is losing reliable oversight over systems that can modify the processes used to evaluate, improve, or replace them. Verification, rollback, and alignment become central.