FOD#139: Living Inside Kardashev’s Head: What Does It Mean For Us?

This Week in Turing Post:

Wednesday / AI 101 series: On-policy distillation zeitgeist
Friday / Open Source AI series: a surprise announcement

🤝 From our partners: Implement an identity framework for securing AI agents

AI agents are shipping fast – and breaking core security assumptions. Agentic workflows introduce anonymous execution, credential sprawl, excessive privilege, poor auditability, and brittle controls. Join Teleport to unpack why legacy identity fails for agentic AI and what AI-ready infrastructure actually requires.

Learn More and Register Here

Our news digest is always free. Click on the partner’s link above to support us. Upgrade to receive our deep dives in full, directly into your inbox. Join Premium members from top companies like Nvidia, Hugging Face, Microsoft, Google, a16z etc plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on with AI →

Upgrade

What an insane week: Claude and ChatGPT launches, markets spiraling down, and an overpacked Clawdbot meetup in SF (check “The News From Usual Suspects” section). But what really caught my attention was the future painted by Elon Musk:

Living Inside Kardashev’s Head

On February 2, 2026, SpaceX published an update announcing that xAI had joined SpaceX. Buried inside the announcement was a line that would have sounded absurd even five years ago: this merger, the company said, is a first step toward becoming a Kardashev Type II civilization.

Pause here for a second.

A Soviet astrophysicist working in the 1960s, in the middle of the Cold War, when radio astronomy and SETI were still young, thinking about extraterrestrial intelligence, has become a reference point for a real capital allocation plan in 2026. Kardashev was a brilliant physicist, no doubt, but much of his framework was necessarily speculative. Well, we are not in theory anymore: we are watching rockets fly, satellites launch, factories expand, and grid demand spike, all while Kardashev is invoked as if he were an internal strategy memo. What a peculiar turn of events!

What Kardashev Meant

Kardashev was not trying to predict the future of humanity. He was trying to solve a detection problem. If advanced civilizations exist, how would we notice them? He thought: look for energy. A civilization capable of large-scale engineering will leave thermodynamic footprints. Waste heat, infrared glow, star-scale manipulation.

He proposed a simple classification:

Type I civilizations harness planetary-scale energy.
Type II harness the energy of their star.
Type III operate on galactic scales.

For decades, the Kardashev scale lived comfortably in the sci-fi and SETI corner because nothing we were building looked remotely relevant. Our technologies were clever, but light. Software-heavy, energy-light.

Not that anyone expected that – but AI changed that equation.

Intelligence Has Grown a Power Bill

The SpaceX update makes a simple claim, almost in passing: current advances in AI depend on large terrestrial data centers, and global electricity demand for AI cannot be met without imposing hardship on communities and the environment.

Taken at face value, this is an admission that intelligence has become infrastructure. It consumes electricity at scale and competes with households, cities, and industry for grid capacity.

Once a technology reaches this stage, progress is no longer gated by ideas alone. It becomes gated by permitting, supply chains, land, and energy.

What “Moving Toward Type II” Means in Practice

Freeman Dyson, an American physicist, speculated that a sufficiently advanced civilization might capture stellar energy by building a vast structure around its star. The image of a “Dyson sphere” stuck, and with it the impression that using solar-scale energy requires fantastical megastructures.

We are not building that.

Moving toward Type II, today, means three very specific things:

First, energy becomes the limiting factor for intelligence. Access to cheap, continuous power at scale – that’s what matters most. This is why AI shows up in utility forecasts, transformer shortages, and regional politics. Once intelligence hits the grid, the grid pushes back.
Second, the geometry of infrastructure starts to matter. On Earth, energy is seasonal, regulated, land-constrained, and socially contested. In orbit, solar power is near-constant and space is abundant. “It’s always sunny in space!” changes where the bottleneck lives.
Third, logistics replaces invention as the hard problem. Starship matters less because it can reach Mars and more because it is meant to move mass repeatedly, cheaply, and on schedule. That changes what is possible. A civilization does not move toward Type II by inventing one breakthrough device (or coding platform). It moves there by building systems that can move material and energy at scale, over and over again, without stopping.

Seen this way, Starlink, Starship, xAI, and orbital compute form a coherent story: intelligence demands energy, energy demands infrastructure, and infrastructure demands scale that Earth increasingly struggles to absorb.

The Uncomfortable Part Kardashev Never Addressed

Kardashev gave us a ruler, but he never really thought about governance. After all, he lived in the Soviet Union, and assumed, I guess, that the USSR would be in control. And that raises a few big questions. If intelligence becomes an energy-intensive utility, then control over energy-to-compute pipelines becomes control over agency. Vertical integration stops being a business strategy and starts becoming a civilizational lever.

The scale does not tell us who should own that substrate, how access should be governed, or how tradeoffs between growth and environmental stability should be handled. It only tells us that capability tracks energy.

That is why invoking Kardashev today is both clarifying and unsettling. It reframes progress in physical terms, but it also exposes how little social machinery we have built around that reality.

Why This Moment Feels Surreal

Kardashev thought his scale would help us notice aliens.

Instead, it is helping us notice ourselves.

It’s almost shocking that his core assumption – that civilization advances by commanding more energy – has reasserted itself as a practical constraint of modern AI.

And the real question is no longer whether Kardashev was right, but whether we are prepared for what it means to organize intelligence, infrastructure, and power on that scale without losing control of the systems we are building. Is it looking too far into the future? I no longer know.

But everything that we see correlates with the trend that research papers also show (see the Research paper category), it’s not about a model anymore, it’s about systems. About energy, throughput, memory, data movement, deployment surfaces, and long-lived infrastructure that sits underneath intelligence and shapes what it can actually do.

We are watching a shift from optimizing architectures to organizing capacity.

Follow us on 🎥 YouTube Twitter Hugging Face 🤗

We are watching/reading:

Thinking out being a workforce for AI. Join me →here

The Second Pre-training Paradigm by Jim Fan
The Anthropic Hive Mind by Steve Yagge
End Game Play by Will Manidis

News from the usual suspects

Claude Opus 4.6 in Claude code vs OpenAI GPT-5.3-Codex = people can’t decide what’s better

Claude Opus 4.6, Incrementally Better
Anthropic launched Claude Opus, an update focused on more consistent reasoning, improved tool use, and better performance on long-context tasks. The release avoids bold claims and flashy benchmarks, instead emphasizing reliability and steady progress. It fits Anthropic’s broader pattern: iterate carefully, prioritize trust, and let adoption do the talking. The most interesting case so far: Building a C compiler with a team of parallel Claudes
GPT-5.3-Codex Expands the Scope of Codex
OpenAI introduced GPT-5.3-Codex, an updated model that combines improved coding performance with broader agentic and professional task support. The release focuses on longer-running tasks, better tool use, and more reliable computer interaction, positioning Codex as something closer to a general work agent than a coding assistant. OpenAI also emphasized internal use, noting material changes in how its own teams operate.

More from OpenAI

ChatGPT Tests Ads, Promises a Firewall
OpenAI began testing ads in ChatGPT for logged-in adult users in the U.S. on the Free and Go tiers. Paid tiers (Plus/Pro/Business/Enterprise/Education) stay ad-free. OpenAI says ads are labeled, kept separate from answers, and do not affect responses; advertisers get only aggregate performance data. Users can manage personalization and delete ad data.
OpenAI Goes Agent-First, on Purpose
In a widely circulated post, OpenAI president Greg Brockman outlined an internal shift toward agentic software development. The goal: agents as the default interface for technical work, replacing editors and terminals where possible. The guidance is notably operational – roles, documentation, infra, and accountability – suggesting this is less a vision statement than an execution plan.

🔦 Paper Highlight

🌟 First proof (🍞)

Researchers from Stanford University, Columbia University, EPFL, Imperial College, Yale University, Harvard University, and other institutions propose a methodology to evaluate LLMs on genuine research-level mathematics. They release ten unpublished math questions spanning algebra, topology, analysis, and numerical linear algebra, each solvable with short proofs unknown online. Answers are encrypted temporarily to prevent data contamination. Initial one-shot tests show frontier AI systems struggle, motivating development of a future benchmark →read the paper

Foundation Models Tech Report

Model Tech Report: Kimi K2.5: Visual Agentic Intelligence
Integrates joint text–vision pretraining and reinforcement learning with parallel agent orchestration to enable scalable multimodal agentic intelligence →read the paper
ERNIE 5.0 Technical Report
Trains a unified autoregressive multimodal foundation model with elastic ultra-sparse MoE routing to support flexible deployment across scale and resource constraints →read the paper

Research this week

(as always, 🌟 indicates papers that we recommend to pay attention to)

This week is about turning intelligence into infrastructure:

Agents are becoming population-based and modular
RL is becoming data-scalable and behavior-aware
Memory, attention, and retrieval are being treated as policies
SWE and GUI are the real stress tests
Systems work is setting the ceiling for everything else

Reinforcement learning, post-training, and alignment mechanics

⭐️ Golden Goose: Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
One of the most strategically important RL papers right now. It breaks the data bottleneck for RLVR by exploiting unverifiable text at scale → read the paper
⭐️ Reinforced Attention Learning
Shifts optimization from tokens to attention distributions. This is a real conceptual step forward for multimodal post-training → read the paper
⭐️ Rethinking the Trust Region in LLM Reinforcement Learning
Argues PPO-style clipping is structurally wrong for LLMs and replaces it with divergence-based constraints. This will age well → read the paper
⭐️ GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt
Shows that post-training safety alignment can be reliably undone using GRPO with minimal supervision, while largely preserving model utility. Important because it treats alignment as reversible behavior, not a stable property, and uses the same RL machinery the field relies on for capability gains → read the paper
F-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare
Fixes rare-solution collapse in group-based RL. A clean, incremental improvement with real gains → read the paper
SLIME: Stabilized Likelihood Implicit Margin Enforcement
Addresses unlearning and formatting collapse in preference optimization. Solid alignment hygiene work → read the paper
Self-Hinting Language Models Enhance Reinforcement Learning
Uses privileged hints during training to prevent GRPO collapse, then removes them at test time. Clever and practical → read the paper
Good SFT Optimizes for SFT, Better SFT Prepares for RL
Important reminder that SFT quality should be judged by downstream RL performance, not standalone metrics → read the paper
On the Entropy Dynamics in Reinforcement Fine-Tuning of LLMs
Theory-heavy but useful for understanding why entropy control methods behave the way they do → read the paper

Agentic systems, self-improvement, orchestration

⭐️ Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing
Group-level evolution beats tree-style self-evolution by actually reusing exploratory diversity. One of the clearest signals that agent learning is shifting from “single mind” to “population dynamics” → read the paper
⭐️ AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Formalizes agents as composable tuples and treats sub-agents as dynamically instantiated tools. This is quietly one of the most practical orchestration abstractions this year → read the paper
⭐️ MARS: Modular Agent with Reflective Search for Automated AI Research
Budget-aware planning + reflective memory for research agents. Important because it treats research as a cost-constrained search problem, not a prompt-engineering task → read the paper
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent RL
Argues that width, not depth, is the right scaling axis for broad search. Strong empirical signal that parallelism beats ever-longer chains → read the paper
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
Uses real-world PR sequences as supervision for long-horizon agency. Interesting mainly as a data lens, less as a general framework → read the paper
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Treats memory operations as learnable skills that themselves evolve. Fits the broader shift toward memory-as-policy → read the paper
RE-TRAC: Recursive Trajectory Compression for Deep Search Agents
Cross-trajectory reflection instead of linear ReAct loops. A clean fix for local-optimum collapse in deep research agents → read the paper

Software engineering agents and verifiable environments

⭐️ SWE-Universe: Scale Real-World Verifiable Environments to Millions
One of the most important infrastructure papers of the week. Million-scale verifiable SWE environments changes what mid-training and RL can even mean for coding agents → read the paper
⭐️ SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training
A transparent, end-to-end recipe for building strong SWE agents. Valuable because it’s reproducible and explicit about the full pipeline → read the paper
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable SWE
Solves the multi-language environment bottleneck. Less flashy, but very necessary if SWE agents are to generalize beyond Python → read the paper
SWE-World: Building Software Engineering Agents in Docker-Free Environments
Replaces real execution with learned surrogates. Important mainly for cost and scalability tradeoffs → read the paper
Closing the Loop: Universal Repository Representation with RPG-Encoder
Treats repo comprehension and generation as inverse processes. Strong representation idea that complements SWE agents nicely → read the paper

World models, reasoning, and long-horizon cognition

⭐️ Reinforcement World Model Learning for LLM-based Agents
Aligns simulated and real next states instead of predicting tokens. This is a strong move away from brittle next-token world models → read the paper
Self-Improving World Modelling with Latent Actions (SWIRL)
Learns world models without action labels by treating actions as latent. Conceptually elegant and broadly applicable → read the paper
InftyThink+: Infinite-Horizon Reasoning via RL
Optimizes when and how to summarize reasoning, not just how long to think. Good evidence that CoT scaling needs structure → read the paper
No Global Plan in Chain-of-Thought
Shows LLMs plan locally, not globally. Useful as a diagnostic lens rather than a training recipe → read the paper
Research on World Models Is Not Merely Injecting World Knowledge
A meta-paper, but an important one. Argues for world models as unified systems, not task-specific hacks → read the paper

Multimodality, GUI agents, and perception-control loops

⭐️ POINTS-GUI-G: GUI-Grounding Journey
One of the clearest demonstrations that RL works extremely well for perception-heavy tasks when rewards are verifiable → read the paper
Generative Visual Code Mobile World Models
Predicts GUI states as executable code instead of pixels. Very strong idea for mobile and UI agents → read the paper
Training Data Efficiency in Multimodal Process Reward Models
Shows most MPRM data is redundant and how to select informative subsets cheaply → read the paper

Model architecture, efficiency, and scaling

⭐️ Horizon-LM: A RAM-Centric Architecture for LLM Training
Redefines the CPU–GPU boundary and makes 100B+ training feasible on a single node. This is a serious systems contribution → read the paper
OmniMoE: Atomic Experts at Scale
Pushes MoE granularity to the extreme while fixing the systems bottlenecks. Strong system–algorithm co-design → read the paper
HySparse: Hybrid Sparse Attention with KV Cache Sharing
Uses full attention as an oracle and reuses KV cache. Very clean design, very practical → read the paper
OmniSIFT: Modality-Asymmetric Token Compression
One of the better token-compression papers for omni-modal models, with real latency wins → read the paper
FASA: Frequency-aware Sparse Attention
Discovers functional sparsity in RoPE frequencies. Elegant and surprisingly effective → read the paper

That’s all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.