This Week in Turing Post:
Wednesday / AI 101 series: On-policy distillation zeitgeist
Friday / Open Source AI series: a surprise announcement
🤝 From our partners: Implement an identity framework for securing AI agents
AI agents are shipping fast – and breaking core security assumptions. Agentic workflows introduce anonymous execution, credential sprawl, excessive privilege, poor auditability, and brittle controls. Join Teleport to unpack why legacy identity fails for agentic AI and what AI-ready infrastructure actually requires.
Our news digest is always free. Click on the partner’s link above to support us. Upgrade to receive our deep dives in full, directly into your inbox. Join Premium members from top companies like Nvidia, Hugging Face, Microsoft, Google, a16z etc plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on with AI →
What an insane week: Claude and ChatGPT launches, markets spiraling down, and an overpacked Clawdbot meetup in SF (check “The News From Usual Suspects” section). But what really caught my attention was the future painted by Elon Musk:
Living Inside Kardashev’s Head
On February 2, 2026, SpaceX published an update announcing that xAI had joined SpaceX. Buried inside the announcement was a line that would have sounded absurd even five years ago: this merger, the company said, is a first step toward becoming a Kardashev Type II civilization.
Pause here for a second.
A Soviet astrophysicist working in the 1960s, in the middle of the Cold War, when radio astronomy and SETI were still young, thinking about extraterrestrial intelligence, has become a reference point for a real capital allocation plan in 2026. Kardashev was a brilliant physicist, no doubt, but much of his framework was necessarily speculative. Well, we are not in theory anymore: we are watching rockets fly, satellites launch, factories expand, and grid demand spike, all while Kardashev is invoked as if he were an internal strategy memo. What a peculiar turn of events!
What Kardashev Meant
Kardashev was not trying to predict the future of humanity. He was trying to solve a detection problem. If advanced civilizations exist, how would we notice them? He thought: look for energy. A civilization capable of large-scale engineering will leave thermodynamic footprints. Waste heat, infrared glow, star-scale manipulation.
He proposed a simple classification:
Type I civilizations harness planetary-scale energy.
Type II harness the energy of their star.
Type III operate on galactic scales.
For decades, the Kardashev scale lived comfortably in the sci-fi and SETI corner because nothing we were building looked remotely relevant. Our technologies were clever, but light. Software-heavy, energy-light.
Not that anyone expected that – but AI changed that equation.
Intelligence Has Grown a Power Bill
The SpaceX update makes a simple claim, almost in passing: current advances in AI depend on large terrestrial data centers, and global electricity demand for AI cannot be met without imposing hardship on communities and the environment.
Taken at face value, this is an admission that intelligence has become infrastructure. It consumes electricity at scale and competes with households, cities, and industry for grid capacity.
Once a technology reaches this stage, progress is no longer gated by ideas alone. It becomes gated by permitting, supply chains, land, and energy.
What “Moving Toward Type II” Means in Practice
Freeman Dyson, an American physicist, speculated that a sufficiently advanced civilization might capture stellar energy by building a vast structure around its star. The image of a “Dyson sphere” stuck, and with it the impression that using solar-scale energy requires fantastical megastructures.
We are not building that.
Moving toward Type II, today, means three very specific things:
First, energy becomes the limiting factor for intelligence. Access to cheap, continuous power at scale – that’s what matters most. This is why AI shows up in utility forecasts, transformer shortages, and regional politics. Once intelligence hits the grid, the grid pushes back.
Second, the geometry of infrastructure starts to matter. On Earth, energy is seasonal, regulated, land-constrained, and socially contested. In orbit, solar power is near-constant and space is abundant. “It’s always sunny in space!” changes where the bottleneck lives.
Third, logistics replaces invention as the hard problem. Starship matters less because it can reach Mars and more because it is meant to move mass repeatedly, cheaply, and on schedule. That changes what is possible. A civilization does not move toward Type II by inventing one breakthrough device (or coding platform). It moves there by building systems that can move material and energy at scale, over and over again, without stopping.
Seen this way, Starlink, Starship, xAI, and orbital compute form a coherent story: intelligence demands energy, energy demands infrastructure, and infrastructure demands scale that Earth increasingly struggles to absorb.
The Uncomfortable Part Kardashev Never Addressed
Kardashev gave us a ruler, but he never really thought about governance. After all, he lived in the Soviet Union, and assumed, I guess, that the USSR would be in control. And that raises a few big questions. If intelligence becomes an energy-intensive utility, then control over energy-to-compute pipelines becomes control over agency. Vertical integration stops being a business strategy and starts becoming a civilizational lever.
The scale does not tell us who should own that substrate, how access should be governed, or how tradeoffs between growth and environmental stability should be handled. It only tells us that capability tracks energy.
That is why invoking Kardashev today is both clarifying and unsettling. It reframes progress in physical terms, but it also exposes how little social machinery we have built around that reality.
Why This Moment Feels Surreal
Kardashev thought his scale would help us notice aliens.
Instead, it is helping us notice ourselves.
It’s almost shocking that his core assumption – that civilization advances by commanding more energy – has reasserted itself as a practical constraint of modern AI.
And the real question is no longer whether Kardashev was right, but whether we are prepared for what it means to organize intelligence, infrastructure, and power on that scale without losing control of the systems we are building. Is it looking too far into the future? I no longer know.
But everything that we see correlates with the trend that research papers also show (see the Research paper category), it’s not about a model anymore, it’s about systems. About energy, throughput, memory, data movement, deployment surfaces, and long-lived infrastructure that sits underneath intelligence and shapes what it can actually do.
We are watching a shift from optimizing architectures to organizing capacity.
Follow us on 🎥 YouTube Twitter Hugging Face 🤗
We are watching/reading:
Thinking out being a workforce for AI. Join me →here
The Second Pre-training Paradigm by Jim Fan
The Anthropic Hive Mind by Steve Yagge
End Game Play by Will Manidis
News from the usual suspects
Claude Opus 4.6 in Claude code vs OpenAI GPT-5.3-Codex = people can’t decide what’s better
Claude Opus 4.6, Incrementally Better
Anthropic launched Claude Opus, an update focused on more consistent reasoning, improved tool use, and better performance on long-context tasks. The release avoids bold claims and flashy benchmarks, instead emphasizing reliability and steady progress. It fits Anthropic’s broader pattern: iterate carefully, prioritize trust, and let adoption do the talking. The most interesting case so far: Building a C compiler with a team of parallel ClaudesGPT-5.3-Codex Expands the Scope of Codex
OpenAI introduced GPT-5.3-Codex, an updated model that combines improved coding performance with broader agentic and professional task support. The release focuses on longer-running tasks, better tool use, and more reliable computer interaction, positioning Codex as something closer to a general work agent than a coding assistant. OpenAI also emphasized internal use, noting material changes in how its own teams operate.
More from OpenAI
ChatGPT Tests Ads, Promises a Firewall
OpenAI began testing ads in ChatGPT for logged-in adult users in the U.S. on the Free and Go tiers. Paid tiers (Plus/Pro/Business/Enterprise/Education) stay ad-free. OpenAI says ads are labeled, kept separate from answers, and do not affect responses; advertisers get only aggregate performance data. Users can manage personalization and delete ad data.OpenAI Goes Agent-First, on Purpose
In a widely circulated post, OpenAI president Greg Brockman outlined an internal shift toward agentic software development. The goal: agents as the default interface for technical work, replacing editors and terminals where possible. The guidance is notably operational – roles, documentation, infra, and accountability – suggesting this is less a vision statement than an execution plan.
More from Anthropic
Agentic Coding Grows Up
A new 2026 Agentic Coding Trends Report argues that software development is shifting from writing code to orchestrating agents. The report highlights coordinated multi-agent systems, long-running agents, and scaled human oversight as the real levers of change. The message is restrained: productivity gains are real, but durable advantage comes from structure, supervision, and security – not full automation.Anthropic Triggers a Market Repricing
Anthropic’s release of Claude Opus 4.6 and its broader push toward long-running, agentic coding systems prompted a sharp selloff across publicly traded AI tooling and dev-infrastructure companies. Investors reacted less to raw benchmarks than to pricing pressure and the implication that large labs are moving directly into territory once reserved for startups. The move forced a fast reassessment of defensibility across the AI software stack.
Cursor Experiments With Self-Driving Codebases
Cursor published detailed research on running large numbers of autonomous coding agents continuously, showing how thousands of agents can coordinate to maintain and evolve a codebase with limited human oversight. The work focuses less on model capability and more on system design: roles, delegation, error tolerance, and throughput. The takeaway is pragmatic – autonomy works, but only with careful structure and clear intent.
🔦 Paper Highlight
🌟 First proof (🍞)

Researchers from Stanford University, Columbia University, EPFL, Imperial College, Yale University, Harvard University, and other institutions propose a methodology to evaluate LLMs on genuine research-level mathematics. They release ten unpublished math questions spanning algebra, topology, analysis, and numerical linear algebra, each solvable with short proofs unknown online. Answers are encrypted temporarily to prevent data contamination. Initial one-shot tests show frontier AI systems struggle, motivating development of a future benchmark →read the paper
Foundation Models Tech Report
Model Tech Report: Kimi K2.5: Visual Agentic Intelligence
Integrates joint text–vision pretraining and reinforcement learning with parallel agent orchestration to enable scalable multimodal agentic intelligence →read the paperERNIE 5.0 Technical Report
Trains a unified autoregressive multimodal foundation model with elastic ultra-sparse MoE routing to support flexible deployment across scale and resource constraints →read the paper
Research this week
(as always, 🌟 indicates papers that we recommend to pay attention to)
This week is about turning intelligence into infrastructure:
Agents are becoming population-based and modular
RL is becoming data-scalable and behavior-aware
Memory, attention, and retrieval are being treated as policies
SWE and GUI are the real stress tests
Systems work is setting the ceiling for everything else
Reinforcement learning, post-training, and alignment mechanics
⭐️ Golden Goose: Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
One of the most strategically important RL papers right now. It breaks the data bottleneck for RLVR by exploiting unverifiable text at scale → read the paper⭐️ Reinforced Attention Learning
Shifts optimization from tokens to attention distributions. This is a real conceptual step forward for multimodal post-training → read the paper⭐️ Rethinking the Trust Region in LLM Reinforcement Learning
Argues PPO-style clipping is structurally wrong for LLMs and replaces it with divergence-based constraints. This will age well → read the paper⭐️ GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt
Shows that post-training safety alignment can be reliably undone using GRPO with minimal supervision, while largely preserving model utility. Important because it treats alignment as reversible behavior, not a stable property, and uses the same RL machinery the field relies on for capability gains → read the paperF-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare
Fixes rare-solution collapse in group-based RL. A clean, incremental improvement with real gains → read the paperSLIME: Stabilized Likelihood Implicit Margin Enforcement
Addresses unlearning and formatting collapse in preference optimization. Solid alignment hygiene work → read the paperSelf-Hinting Language Models Enhance Reinforcement Learning
Uses privileged hints during training to prevent GRPO collapse, then removes them at test time. Clever and practical → read the paperGood SFT Optimizes for SFT, Better SFT Prepares for RL
Important reminder that SFT quality should be judged by downstream RL performance, not standalone metrics → read the paperOn the Entropy Dynamics in Reinforcement Fine-Tuning of LLMs
Theory-heavy but useful for understanding why entropy control methods behave the way they do → read the paper
Agentic systems, self-improvement, orchestration
⭐️ Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing
Group-level evolution beats tree-style self-evolution by actually reusing exploratory diversity. One of the clearest signals that agent learning is shifting from “single mind” to “population dynamics” → read the paper⭐️ AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Formalizes agents as composable tuples and treats sub-agents as dynamically instantiated tools. This is quietly one of the most practical orchestration abstractions this year → read the paper⭐️ MARS: Modular Agent with Reflective Search for Automated AI Research
Budget-aware planning + reflective memory for research agents. Important because it treats research as a cost-constrained search problem, not a prompt-engineering task → read the paperWideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent RL
Argues that width, not depth, is the right scaling axis for broad search. Strong empirical signal that parallelism beats ever-longer chains → read the paperdaVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
Uses real-world PR sequences as supervision for long-horizon agency. Interesting mainly as a data lens, less as a general framework → read the paperMemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Treats memory operations as learnable skills that themselves evolve. Fits the broader shift toward memory-as-policy → read the paperRE-TRAC: Recursive Trajectory Compression for Deep Search Agents
Cross-trajectory reflection instead of linear ReAct loops. A clean fix for local-optimum collapse in deep research agents → read the paper
Software engineering agents and verifiable environments
⭐️ SWE-Universe: Scale Real-World Verifiable Environments to Millions
One of the most important infrastructure papers of the week. Million-scale verifiable SWE environments changes what mid-training and RL can even mean for coding agents → read the paper⭐️ SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training
A transparent, end-to-end recipe for building strong SWE agents. Valuable because it’s reproducible and explicit about the full pipeline → read the paperMEnvAgent: Scalable Polyglot Environment Construction for Verifiable SWE
Solves the multi-language environment bottleneck. Less flashy, but very necessary if SWE agents are to generalize beyond Python → read the paperSWE-World: Building Software Engineering Agents in Docker-Free Environments
Replaces real execution with learned surrogates. Important mainly for cost and scalability tradeoffs → read the paperClosing the Loop: Universal Repository Representation with RPG-Encoder
Treats repo comprehension and generation as inverse processes. Strong representation idea that complements SWE agents nicely → read the paper
World models, reasoning, and long-horizon cognition
⭐️ Reinforcement World Model Learning for LLM-based Agents
Aligns simulated and real next states instead of predicting tokens. This is a strong move away from brittle next-token world models → read the paperSelf-Improving World Modelling with Latent Actions (SWIRL)
Learns world models without action labels by treating actions as latent. Conceptually elegant and broadly applicable → read the paperInftyThink+: Infinite-Horizon Reasoning via RL
Optimizes when and how to summarize reasoning, not just how long to think. Good evidence that CoT scaling needs structure → read the paperNo Global Plan in Chain-of-Thought
Shows LLMs plan locally, not globally. Useful as a diagnostic lens rather than a training recipe → read the paperResearch on World Models Is Not Merely Injecting World Knowledge
A meta-paper, but an important one. Argues for world models as unified systems, not task-specific hacks → read the paper
Multimodality, GUI agents, and perception-control loops
⭐️ POINTS-GUI-G: GUI-Grounding Journey
One of the clearest demonstrations that RL works extremely well for perception-heavy tasks when rewards are verifiable → read the paperGenerative Visual Code Mobile World Models
Predicts GUI states as executable code instead of pixels. Very strong idea for mobile and UI agents → read the paperTraining Data Efficiency in Multimodal Process Reward Models
Shows most MPRM data is redundant and how to select informative subsets cheaply → read the paper
Model architecture, efficiency, and scaling
⭐️ Horizon-LM: A RAM-Centric Architecture for LLM Training
Redefines the CPU–GPU boundary and makes 100B+ training feasible on a single node. This is a serious systems contribution → read the paperOmniMoE: Atomic Experts at Scale
Pushes MoE granularity to the extreme while fixing the systems bottlenecks. Strong system–algorithm co-design → read the paperHySparse: Hybrid Sparse Attention with KV Cache Sharing
Uses full attention as an oracle and reuses KV cache. Very clean design, very practical → read the paperOmniSIFT: Modality-Asymmetric Token Compression
One of the better token-compression papers for omni-modal models, with real latency wins → read the paperFASA: Frequency-aware Sparse Attention
Discovers functional sparsity in RoPE frequencies. Elegant and surprisingly effective → read the paper
That’s all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.


