If Turing Post is part of your Monday routine, please share it with one smart friend. Itβs the simplest way to keep the Monday digests free.
This Week in Turing Post:
Wednesday / AI 101 series: OpenClaw explained: The Boom and the Architecture of Local AI Agents
Friday / Open Source AI series
I want to start by thanking Matt Shumer for his βSomething Big is Happening.β It ricocheted through the part of Twitter I follow until it felt like it had swallowed the whole platform. People were quoting it, reacting to it, forwarding it to their friends with the digital equivalent of grabbing someone by the shoulders.
It has 83 million views. Clearly, it hit a nerve.
I also want to argue with it, even if that puts me on the unpopular side of the timeline. Because his piece gave me a real anxiety, the one that is unproductive.
Letβs start with what I agree with.
Matt is right that the pace feels different now. For people who actually use frontier models daily, βAI as a helpful toolβ has been sliding toward βAI as an independent workerβ in a way that is hard to explain to someone who only played with a free-tier chatbot a year ago. Heβs also right about the perception gap: public understanding lags behind capability, and the lag creates bad decisions. The most wrong thing you can do today is to dismiss AI.
Heβs also right about the labor market direction. If your work happens on a screen and your core output is text, analysis, code, structured documents, and decisions expressed through a keyboard, you are exposed. The question is not whether AI touches your job. It already does. The question is how quickly tasks get unbundled, automated, and re-priced inside your role. By you, but Matt doesnβt say that explicitly.
Now what I disagree with.
First, I reject the emotional framing. Comparing this moment to February 2020 is effective storytelling, but it also turns βlearning how to work with a new general-purpose toolβ into an emergency broadcast. That framing produces a very specific kind of reader: anxious, compulsively online, and primed to interpret every model release as a life-or-death update. If you already spend time in the Silicon Valley bubble, this is gasoline on the fire. If not, you will feel this sticky anxiety. brrr
That anxiety is not βAI will take my job tomorrow.β Itβs βthe discourse is training us to live in permanent cognitive overdrive.β That is simply inhuman. Twitterβs intensity can make you feel behind even when you are actively shipping work with these systems. There is always another tool, another meetup, another startup demo, another βyouβre lateβ thread. That is a very effective recipe for burnout.
Second, I donβt buy the implied uniformity of impact. Capability is one curve. Adoption is another. Incentives, regulation, liability, procurement, internal politics, and institutional inertia are their own curves, and they do not politely synchronize. Some roles will compress rapidly. Others will change slowly, then suddenly. Mattβs directional forecast can be right while the timeline distribution across industries is far messier than βone to five yearsβ suggests. Itβs big, but it also medium (all this in the middle. mediocre stuff).
So where does it bring us?
Third thing I disagree with: how to learn working with AI.
Instead of emotions we should think about goal-setting. Taste. Knowing what matters. About stitching context into a decision that has consequences. To being accountable. About the boring parts that turn capability into reality: integration, evaluation, reliability, compliance, human trust, organizational adoption, and all the messy edges where the real world refuses to behave like a clean benchmark. Again, itβs that medium part that matters, not the grandeur of a model or a tool.
Matt gives an advise: βSpend one hour a day experimenting with AI.β And I just disagree with that so much.
It teaches a completely wrong muscle. Time is not the unit of learning. Feedback is.
Kids donβt learn by allocating 60 minutes to βwalking practice.β They learn because they want something: open the jar, reach the table, climb the stairs, get the parentβs attention. Goal first. Attempts. Feedback. Repeat until the world changes.
So instead of βplayingβ with AI, you should choose a goal and achieve one real outcome per week meaningfully better with AI.
That forces a goal. And a goal forces evaluation. And it actually makes you feel better because you start achieving things.
Thereβs also a quieter (literal) point that gets missed in the alarm: if youβre reading this, youβre already inside the tiny internet class that can spend hours discussing AI on the internet. Thatβs not βeveryone.β Thatβs a self-selected group with a particular set of incentives, and sometimes a suspicious amount of time. Maybe thatβs what we need AI for β to let us spend more time on social networksβ¦ Anyway, 84 millions is very big. But not as much as 8 billion people on the planet.
What I would like to leave you with: treat AI like a power tool with a marketing department. Respect the capability. Ignore the adrenaline. Pick a goal you genuinely care about, then use the tool to move faster toward it. Your intelligence now is to move AI towards the right outcome for you.
Happy building.
Follow us on π₯ YouTube Twitter Hugging Face π€
We are watching/reading:
The tension and friction of AI in the real world βwatch here
Twitter Library
News from the usual suspects
OpenAI eats OpenClaw
Everyone is stil labslutely blown by OpenClaw. Kimi intoduced Kimi Claw with 5000 skills (read their guide here) and a few more examples we are collecting here β
The news digest is a bit shorter today due to the Presidentβs Day which is a holiday in the US.
π¦ Paper and Achievement Highlight

This week marked a shift from βLLMs solving puzzlesβ to βLLMs doing research chores.β DeepMindβs Aletheia (βread their amazing paper here) couples a strong reasoner with a generatorβverifierβreviser loop plus heavy tool use to navigate literature, producing results from Olympiad proofs to PhD exercises and even fully AI-generated or co-authored math papers, alongside a proposed taxonomy for autonomy and novelty.
In parallel, OpenAI reports GPT-5.2 spotting a closed-form pattern for a βsingle-minusβ gluon amplitude in a half-collinear regime after humans computed small-n cases (βread their blog here); an internal scaffolded system then proved and checked the formula against standard recursions and constraints. The trend is research-grade AI as a workflow: propose, simplify, verify, and document contributions like a responsible coauthor, not a flashy calculator.
Research this week
(as always, π indicates papers that we recommend to pay attention to)
This week is about making agents train themselves and stress-testing the loop:
Agents are moving into synthetic worlds and GUI sandboxes
RL is being stabilized, asymmetrized, filtered, and made curriculum-aware
Distillation is going beyond teachers, using weak checkpoints and self-feedback
Test-time scaling is becoming selective, iterative, and budget-conditioned
World models are merging video, audio, action, and memory
Safety limits of self-evolving societies are becoming structural, not incidental
Models and General Architectures
πππ InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery (by Shanghai AI Lab) β Constructs a coordinated generationβverificationβevolution architecture that supports deep research, long-horizon memory, and end-to-end scientific discovery across computational and empirical domains βread the paper
πππ Towards Autonomous Mathematics Research (by Google DeepMind) β Presents a research agent that iteratively generates, verifies, and revises mathematical solutions with tool use and inference-time scaling, extending from Olympiad tasks to research-level open problems βread the paper
πππ Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning (by Snowflake) β Creates large-scale synthetic, code-driven environments with reliable state transitions and tool interfaces to support scalable RL training and strong out-of-distribution generalization βread the paper
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters β Introduces a sparse MoE model that combines a large frozen backbone with efficient active parameters, scaled RL, and latency-aware attention to deliver frontier-level agentic reasoning at lower serving cost βread the paper
UI-Venus-1.5 Technical Report β Builds a unified end-to-end GUI agent through large-scale mid-training, online RL with full-trajectory rollouts, and model merging across grounding, web, and mobile domains to achieve strong real-world navigation βread the paper
MOVA: Towards Scalable and Synchronized Video-Audio Generation β Develops an open MoE model that jointly generates video and synchronized audio, aligning speech, sound effects, and music within a single unified multimodal architecture βread the paper
WorldCompass: Reinforcement Learning for Long-Horizon World Models β Applies tailored RL strategies to autoregressive video world models, improving long-horizon interaction accuracy and visual fidelity through clip-level rollouts and complementary rewards βread the paper
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model β Learns leakage-free latent state prediction for VLA models using JEPA-style pretraining, improving robustness to nuisance variation and strengthening downstream action fine-tuning βread the paper
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation β Trains looped Transformers with variable-length trajectories and shortcut-consistency objectives to enable budget-conditioned reasoning under flexible compute constraints βread the paper
Voxtral Realtime β Introduces a natively streaming speech recognition model trained end-to-end for low-latency alignment between audio and text streams, matching offline transcription quality at sub-second delay βread the paper
Agentic RL, Distillation, Alignment & Exploration
πππ Weak-Driven Learning: How Weak Agents make Strong Agents Stronger β Leverages weak historical checkpoints to identify recoverable learning gaps and reinforce them during post-training, overcoming saturation without increasing inference cost βread the paper
πππ Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation (by Tencent) β Generalizes on-policy distillation by scaling the reward term beyond standard KL balancing, enabling students to surpass teachers and merge domain-specific expertise βread the paper
πππ Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation β Identifies symmetry-induced exploration limits in group-relative advantage estimation and proposes asymmetric modulation to improve difficulty adaptation βread the paper
πππ InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning (Ant Group) β Learns when to summarize, preserve, and resume reasoning in iterative loops via trajectory-level RL, reducing latency while improving math performance βread the paper
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models β Generates compositional prompts from pass-rate-1 examples to expand effective RLVR data, improving reasoning and enabling cross-domain reinforcement learning βread the paper
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning β Distills reusable skills from trajectories into a hierarchical library and recursively co-evolves this skill bank with policy updates to improve long-horizon generalization βread the paper
Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems β Normalizes advantages per agent to stabilize multi-agent GRPO-style training, reducing gradient spikes while improving math and search performance βread the paper
iGRPO: Self-Feedback-Driven LLM Reasoning β Extends GRPO with iterative draft selection and refinement, training models to improve beyond their own best prior attempt using self-conditioned updates βread the paper
Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO β Replaces outcome-only rewards with incremental step-level signals and turning-point aggregation to better capture delayed effects in diffusion-based GRPO training βread the paper
Online Causal Kalman Filtering for Stable and Effective Policy Optimization β Applies an online Kalman filter to smooth token-level importance sampling ratios, reducing variance and stabilizing large-scale RL for language models βread the paper
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning β Encourages longer reasoning trajectories through length-based rewards and redundancy penalties, mitigating shallow exploration in autoregressive sampling βread the paper
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning β Demonstrates that repeated training on small chain-of-thought datasets can outperform large single-pass scaling, reframing memorization as a path to better generalization βread the paper
Improving Data and Reward Design for Scientific Reasoning in Large Language Models β Redesigns science post-training with structured datasets, exploration-expanding SFT, curriculum scheduling, and rubric-guided RL to improve open-ended scientific reasoning βread the paper
Multimodal, GUI & Agent World Modeling
πππ PhyCritic: Multimodal Critic Models for Physical AI (by NVIDIA) β Trains a physically grounded multimodal critic through RLVR and self-referential judgment, improving evaluation and policy alignment for embodied reasoning tasks βread the paper
Code2World: A GUI World Model via Renderable Code Generation β Predicts next UI states by generating renderable code instead of pixels, aligning visual fidelity with structural controllability for downstream navigation gains βread the paper
ASA: Training-Free Representation Engineering for Tool-Calling Agents β Performs mid-layer activation steering with lightweight routing and gating to improve strict tool-use behavior without any weight updates βread the paper
Efficiency, Attention & Representation Engineering
Prism: Spectral-Aware Block-Sparse Attention β Corrects spectral blind spots in mean-pooled block selection under RoPE and restores positional sensitivity, enabling efficient block-level sparsity with strong accuracy retention βread the paper
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning β Introduces update and exit gates within a recurrent memory loop to control context accumulation and early stopping, improving both efficiency and long-context accuracy βread the paper
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning β Studies causal versus bidirectional masking for user embeddings and proposes gradient-guided soft masking to stabilize transitions and improve representation quality βread the paper
dVoting: Fast Voting for dLLMs β Exploits parallel token generation in diffusion LLMs to iteratively refine uncertain positions through consistency-based voting, boosting reasoning without retraining βread the paper
Safety, Limits & Self-Evolving Systems
πππ The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies (by Beijing University) β Argues that fully isolated, continuously self-evolving agent societies inevitably erode safety alignment, formalizing a trilemma between autonomy, isolation, and safety invariance βread the paper
Thatβs all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.


