If Turing Post is part of your Monday routine, please share it with one smart friend. It’s the simplest way to keep the Monday digests free.

This Week in Turing Post:

  • Wednesday / AI 101 series: OpenClaw explained: The Boom and the Architecture of Local AI Agents

  • Friday / Open Source AI series

I want to start by thanking Matt Shumer for his β€œSomething Big is Happening.” It ricocheted through the part of Twitter I follow until it felt like it had swallowed the whole platform. People were quoting it, reacting to it, forwarding it to their friends with the digital equivalent of grabbing someone by the shoulders.

It has 83 million views. Clearly, it hit a nerve.

I also want to argue with it, even if that puts me on the unpopular side of the timeline. Because his piece gave me a real anxiety, the one that is unproductive.

Let’s start with what I agree with.

Matt is right that the pace feels different now. For people who actually use frontier models daily, β€œAI as a helpful tool” has been sliding toward β€œAI as an independent worker” in a way that is hard to explain to someone who only played with a free-tier chatbot a year ago. He’s also right about the perception gap: public understanding lags behind capability, and the lag creates bad decisions. The most wrong thing you can do today is to dismiss AI.

He’s also right about the labor market direction. If your work happens on a screen and your core output is text, analysis, code, structured documents, and decisions expressed through a keyboard, you are exposed. The question is not whether AI touches your job. It already does. The question is how quickly tasks get unbundled, automated, and re-priced inside your role. By you, but Matt doesn’t say that explicitly.

Now what I disagree with.

First, I reject the emotional framing. Comparing this moment to February 2020 is effective storytelling, but it also turns β€œlearning how to work with a new general-purpose tool” into an emergency broadcast. That framing produces a very specific kind of reader: anxious, compulsively online, and primed to interpret every model release as a life-or-death update. If you already spend time in the Silicon Valley bubble, this is gasoline on the fire. If not, you will feel this sticky anxiety. brrr

That anxiety is not β€œAI will take my job tomorrow.” It’s β€œthe discourse is training us to live in permanent cognitive overdrive.” That is simply inhuman. Twitter’s intensity can make you feel behind even when you are actively shipping work with these systems. There is always another tool, another meetup, another startup demo, another β€œyou’re late” thread. That is a very effective recipe for burnout.

Second, I don’t buy the implied uniformity of impact. Capability is one curve. Adoption is another. Incentives, regulation, liability, procurement, internal politics, and institutional inertia are their own curves, and they do not politely synchronize. Some roles will compress rapidly. Others will change slowly, then suddenly. Matt’s directional forecast can be right while the timeline distribution across industries is far messier than β€œone to five years” suggests. It’s big, but it also medium (all this in the middle. mediocre stuff).

So where does it bring us?

Third thing I disagree with: how to learn working with AI.

Instead of emotions we should think about goal-setting. Taste. Knowing what matters. About stitching context into a decision that has consequences. To being accountable. About the boring parts that turn capability into reality: integration, evaluation, reliability, compliance, human trust, organizational adoption, and all the messy edges where the real world refuses to behave like a clean benchmark. Again, it’s that medium part that matters, not the grandeur of a model or a tool.

Matt gives an advise: β€œSpend one hour a day experimenting with AI.” And I just disagree with that so much.

It teaches a completely wrong muscle. Time is not the unit of learning. Feedback is.

Kids don’t learn by allocating 60 minutes to β€œwalking practice.” They learn because they want something: open the jar, reach the table, climb the stairs, get the parent’s attention. Goal first. Attempts. Feedback. Repeat until the world changes.

So instead of β€œplaying” with AI, you should choose a goal and achieve one real outcome per week meaningfully better with AI.

That forces a goal. And a goal forces evaluation. And it actually makes you feel better because you start achieving things.

There’s also a quieter (literal) point that gets missed in the alarm: if you’re reading this, you’re already inside the tiny internet class that can spend hours discussing AI on the internet. That’s not β€œeveryone.” That’s a self-selected group with a particular set of incentives, and sometimes a suspicious amount of time. Maybe that’s what we need AI for – to let us spend more time on social networks… Anyway, 84 millions is very big. But not as much as 8 billion people on the planet.

What I would like to leave you with: treat AI like a power tool with a marketing department. Respect the capability. Ignore the adrenaline. Pick a goal you genuinely care about, then use the tool to move faster toward it. Your intelligence now is to move AI towards the right outcome for you.

Happy building.

Follow us on πŸŽ₯ YouTube Twitter Hugging Face πŸ€—

We are watching/reading:

Twitter Library

News from the usual suspects

OpenAI eats OpenClaw

Everyone is stil labslutely blown by OpenClaw. Kimi intoduced Kimi Claw with 5000 skills (read their guide here) and a few more examples we are collecting here β†’

The news digest is a bit shorter today due to the President’s Day which is a holiday in the US.

πŸ”¦ Paper and Achievement Highlight

This week marked a shift from β€œLLMs solving puzzles” to β€œLLMs doing research chores.” DeepMind’s Aletheia (β†’read their amazing paper here) couples a strong reasoner with a generator–verifier–reviser loop plus heavy tool use to navigate literature, producing results from Olympiad proofs to PhD exercises and even fully AI-generated or co-authored math papers, alongside a proposed taxonomy for autonomy and novelty.

In parallel, OpenAI reports GPT-5.2 spotting a closed-form pattern for a β€œsingle-minus” gluon amplitude in a half-collinear regime after humans computed small-n cases (β†’read their blog here); an internal scaffolded system then proved and checked the formula against standard recursions and constraints. The trend is research-grade AI as a workflow: propose, simplify, verify, and document contributions like a responsible coauthor, not a flashy calculator.

Research this week

(as always, 🌟 indicates papers that we recommend to pay attention to)

This week is about making agents train themselves and stress-testing the loop:

  • Agents are moving into synthetic worlds and GUI sandboxes

  • RL is being stabilized, asymmetrized, filtered, and made curriculum-aware

  • Distillation is going beyond teachers, using weak checkpoints and self-feedback

  • Test-time scaling is becoming selective, iterative, and budget-conditioned

  • World models are merging video, audio, action, and memory

  • Safety limits of self-evolving societies are becoming structural, not incidental

Models and General Architectures

  • 🌟🌟🌟 InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery (by Shanghai AI Lab) – Constructs a coordinated generation–verification–evolution architecture that supports deep research, long-horizon memory, and end-to-end scientific discovery across computational and empirical domains β†’read the paper

  • 🌟🌟🌟 Towards Autonomous Mathematics Research (by Google DeepMind) – Presents a research agent that iteratively generates, verifies, and revises mathematical solutions with tool use and inference-time scaling, extending from Olympiad tasks to research-level open problems β†’read the paper

  • 🌟🌟🌟 Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning (by Snowflake) – Creates large-scale synthetic, code-driven environments with reliable state transitions and tool interfaces to support scalable RL training and strong out-of-distribution generalization β†’read the paper

  • Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters – Introduces a sparse MoE model that combines a large frozen backbone with efficient active parameters, scaled RL, and latency-aware attention to deliver frontier-level agentic reasoning at lower serving cost β†’read the paper

  • UI-Venus-1.5 Technical Report – Builds a unified end-to-end GUI agent through large-scale mid-training, online RL with full-trajectory rollouts, and model merging across grounding, web, and mobile domains to achieve strong real-world navigation β†’read the paper

  • MOVA: Towards Scalable and Synchronized Video-Audio Generation – Develops an open MoE model that jointly generates video and synchronized audio, aligning speech, sound effects, and music within a single unified multimodal architecture β†’read the paper

  • WorldCompass: Reinforcement Learning for Long-Horizon World Models – Applies tailored RL strategies to autoregressive video world models, improving long-horizon interaction accuracy and visual fidelity through clip-level rollouts and complementary rewards β†’read the paper

  • VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model – Learns leakage-free latent state prediction for VLA models using JEPA-style pretraining, improving robustness to nuisance variation and strengthening downstream action fine-tuning β†’read the paper

  • LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation – Trains looped Transformers with variable-length trajectories and shortcut-consistency objectives to enable budget-conditioned reasoning under flexible compute constraints β†’read the paper

  • Voxtral Realtime – Introduces a natively streaming speech recognition model trained end-to-end for low-latency alignment between audio and text streams, matching offline transcription quality at sub-second delay β†’read the paper

Agentic RL, Distillation, Alignment & Exploration

  • 🌟🌟🌟 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger – Leverages weak historical checkpoints to identify recoverable learning gaps and reinforce them during post-training, overcoming saturation without increasing inference cost β†’read the paper

  • 🌟🌟🌟 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation (by Tencent) – Generalizes on-policy distillation by scaling the reward term beyond standard KL balancing, enabling students to surpass teachers and merge domain-specific expertise β†’read the paper

  • 🌟🌟🌟 Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation – Identifies symmetry-induced exploration limits in group-relative advantage estimation and proposes asymmetric modulation to improve difficulty adaptation β†’read the paper

  • 🌟🌟🌟 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning (Ant Group) – Learns when to summarize, preserve, and resume reasoning in iterative loops via trajectory-level RL, reducing latency while improving math performance β†’read the paper

  • Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models – Generates compositional prompts from pass-rate-1 examples to expand effective RLVR data, improving reasoning and enabling cross-domain reinforcement learning β†’read the paper

  • SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning – Distills reusable skills from trajectories into a hierarchical library and recursively co-evolves this skill bank with policy updates to improve long-horizon generalization β†’read the paper

  • Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems – Normalizes advantages per agent to stabilize multi-agent GRPO-style training, reducing gradient spikes while improving math and search performance β†’read the paper

  • iGRPO: Self-Feedback-Driven LLM Reasoning – Extends GRPO with iterative draft selection and refinement, training models to improve beyond their own best prior attempt using self-conditioned updates β†’read the paper

  • Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO – Replaces outcome-only rewards with incremental step-level signals and turning-point aggregation to better capture delayed effects in diffusion-based GRPO training β†’read the paper

  • Online Causal Kalman Filtering for Stable and Effective Policy Optimization – Applies an online Kalman filter to smooth token-level importance sampling ratios, reducing variance and stabilizing large-scale RL for language models β†’read the paper

  • Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning – Encourages longer reasoning trajectories through length-based rewards and redundancy penalties, mitigating shallow exploration in autoregressive sampling β†’read the paper

  • Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning – Demonstrates that repeated training on small chain-of-thought datasets can outperform large single-pass scaling, reframing memorization as a path to better generalization β†’read the paper

  • Improving Data and Reward Design for Scientific Reasoning in Large Language Models – Redesigns science post-training with structured datasets, exploration-expanding SFT, curriculum scheduling, and rubric-guided RL to improve open-ended scientific reasoning β†’read the paper

Multimodal, GUI & Agent World Modeling

  • 🌟🌟🌟 PhyCritic: Multimodal Critic Models for Physical AI (by NVIDIA) – Trains a physically grounded multimodal critic through RLVR and self-referential judgment, improving evaluation and policy alignment for embodied reasoning tasks β†’read the paper

  • Code2World: A GUI World Model via Renderable Code Generation – Predicts next UI states by generating renderable code instead of pixels, aligning visual fidelity with structural controllability for downstream navigation gains β†’read the paper

  • ASA: Training-Free Representation Engineering for Tool-Calling Agents – Performs mid-layer activation steering with lightweight routing and gating to improve strict tool-use behavior without any weight updates β†’read the paper

Efficiency, Attention & Representation Engineering

  • Prism: Spectral-Aware Block-Sparse Attention – Corrects spectral blind spots in mean-pooled block selection under RoPE and restores positional sensitivity, enabling efficient block-level sparsity with strong accuracy retention β†’read the paper

  • When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning – Introduces update and exit gates within a recurrent memory loop to control context accumulation and early stopping, improving both efficiency and long-context accuracy β†’read the paper

  • How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning – Studies causal versus bidirectional masking for user embeddings and proposes gradient-guided soft masking to stabilize transitions and improve representation quality β†’read the paper

  • dVoting: Fast Voting for dLLMs – Exploits parallel token generation in diffusion LLMs to iteratively refine uncertain positions through consistency-based voting, boosting reasoning without retraining β†’read the paper

Safety, Limits & Self-Evolving Systems

  • 🌟🌟🌟 The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies (by Beijing University) – Argues that fully isolated, continuously self-evolving agent societies inevitably erode safety alignment, formalizing a trilemma between autonomy, isolation, and safety invariance β†’read the paper

That’s all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

Reply

Avatar

or to participate

Keep Reading