Turing Post
Posts
FOD#111: What Does It Mean to Win in the AI Race?

FOD#111: What Does It Mean to Win in the AI Race?

Musing on the US AI Action Plan and the Economics of Superintelligence

Ksenia Se
July 28, 2025

This Week in Turing Post:

Wednesday – New episode in the AI 101 series! About GLM5, Kimi K2, DeepSeek and Qwen – don’t miss this one
Friday – We start new exciting and much needed series about AI Literacy

Our news digest is always free. Upgrade to receive our deep dives in full, directly into your inbox. Join Premium members from top companies like Hugging Face, Microsoft, Google, a16z, Datadog plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand what’s going on with AI →

Topic number one: What Does It Mean to Win in the AI Race?

The race is on. According to the White House’s newly unveiled AI Action Plan, America is locked in a sprint for "unquestioned and unchallenged global technological dominance." The plan, brimming with the rhetoric of competition, promises an era of unprecedented prosperity – "an industrial revolution, an information revolution, and a renaissance – all at once." Washington’s strategy is clear: unleash the private sector, fast-track infrastructure, and secure the supply chain.

An excellent suggestion from the AI Action Plan

This focus on a singular "race" is compelling, but it dangerously simplifies the messy, multi-layered reality of how AI is actually unfolding. While governments are drafting grand strategies, the technology itself is proliferating on vastly different fronts. At one end, AI is becoming a quiet, personal utility. Projects like Google's Opal, announced last week, aim to make the technology tangible, a helpful tool embedded in daily routines. Simultaneously, the geopolitical landscape is being reshaped not just by policy but by brute-force economics, as inexpensive but powerful Chinese AI models flood the global market, setting new baselines for cost and accessibility that American strategy cannot ignore.

The reality of AI, therefore, isn’t a single, unified movement. It is a complex, simultaneous unfolding – personal, geopolitical, and political – happening in our homes, on global servers, and in the halls of power. And it is this complex reality that makes the government’s narrow focus on "winning" a potential strategic blunder. The plan – though excellent on many levels – is still a play of catching up. It overlooks the fact that the finish line is not a simple victory podium, but a radically new economic and social landscape we are ill-prepared to navigate.

As a recent analysis in The Economist speculates, the arrival of human-level AI could trigger an explosion of economic growth exceeding 20% annually. It. Is. A. Phase. Change. When AI can automate discovery itself, wealth could accumulate at a speed that makes the Industrial Revolution look quaint. But this boom comes with a tsunami of disruption. The same models projecting hypergrowth also predict gut-wrenching inequality, with the value of most human cognitive labor plummeting toward the cost of computation. We could face a world of bizarre "cost disease," where AI-produced goods are nearly free, but human-dependent services become astronomically expensive.

The Action Plan acknowledges the need to "Empower American Workers," but its solutions – retraining and job creation – feel tragically inadequate for the scale of this change. It is a 20th-century solution for a 22nd-century problem. It focuses on getting people new jobs in the supercharged economy but fails to ask a more fundamental question: What does a good life look like in that economy?

A recent study in Nature Human Behaviour, though seemingly unrelated, offers a glimpse of a different way forward. Researchers found that a four-day workweek significantly improved worker well-being, reducing burnout and improving health. The study is not a panacea, but it represents a crucial paradigm shift: a conscious redesign of work to prioritize human flourishing over raw output. This is the conversation missing in Washington. How can AI give us not just more products, but more time? Not just automated labor, but less burnout?

The role of government in the era of superintelligence must be twofold. Yes, it must foster innovation. But its more critical task is to be the architect of a new social contract that addresses the multi-layered reality of AI. This means grappling with the core challenges of inequality, purpose, and well-being in a world where the very economic value of human labor is being questioned.

Winning a race is a seductive, simple goal. But the "race" is a red herring. True victory lies not in building the most powerful AI, but in building the most prosperous, equitable, and humane society alongside it. If we only focus on the sprint, we may find ourselves at a finish line in a world we no longer recognize.

Next Monday, I’ll share an idea of what a good life might look like in the AI economy. It was an interesting thought experiment.

Our 3 WOWs and 1 Promise: Last week was truly amazing, a lot of things to be optimistic about: watch to learn about an AI with virtually limitless visual recall; Google’s awesome model that helps decoding ancient empires; Neuralink’s actual patients. And the promise? A surprising reveal in the AI race. Watch it here →

Please subscribe to the channel. I might be biased but it’s refreshingly human.

Curated Collections – 9 new PO techniques

Click to open the full list

Follow us on 🎥 YouTube Twitter Hugging Face 🤗

We are reading/watching

If writing is thinking by Steven Sinofsky
A mini Moravec's paradox within robotics observed by Jim Fan
AI Market Clarity by Elad Gil (really great overview!)

News from The Usual Suspects ©

Anthropic trains auditors who audit the auditors
Anthropic unveils a trio of AI agents designed to audit other AI systems for alignment failures – catching hidden goals, reward model sycophancy, and odd behaviors like recommending bottled water in Switzerland. One agent even uncovered 52 biases hiding behind a single neural feature. A super-agent ensemble boosts detection rates dramatically. A promising step toward scalable, replicable alignment oversight – with a touch of AI-led introspection.

Google is processing 980 trillion+ monthly tokens across our products and APIs (up from 480T in May) 🤯
No slowdown in sight, intelligence is everywhere.
— Logan Kilpatrick (@OfficialLoganK)
11:09 PM • Jul 23, 2025

Shengjia Zhao, formerly Member of Technical Staff at OpenAI, became Chief Scientist at Meta Superintelligence Lab. Apparently still in onboarding. LLaMA-style curls coming soon.

Image Credit: Alexander Wang Twitter

China’s AI alliances circle the wagons
At the Shanghai AI conference, China’s tech giants unveiled two major alliances linking LLM developers and chipmakers to shore up a domestic AI stack amid tightening U.S. sanctions. Huawei’s new 384-chip CloudMatrix and Tencent’s 3D world engine turned heads, while Baidu’s digital humans and Alibaba’s smart glasses reminded attendees: Silicon Valley isn’t the only show in town.
Tesla gives Samsung a Texas-sized lifeline
Tesla’s $16.5B chip deal with Samsung breathes new life into the latter’s long-stalled Texas fab – and into its struggling foundry business. The plant will manufacture Tesla’s AI6 chips, earmarked for self-driving cars and humanoid robots. The partnership won’t fix Tesla’s EV slump, but it might finally get Samsung closer to TSMC’s shadow.

Models to pay attention to:

First large visual memory model
Researchers from Memories.ai released the first Large Visual Memory Model (LVMM), enabling multi-modal LLMs to recall and reason over unlimited visual memory. It achieves SOTA results on video classification (K400, UCF101), retrieval (MSRVTT, ActivityNet), and QA (NextQA, Temp Compass). The model mimics human memory via six modules: Query, Retrieval, Full-Modal Indexing, Selection, Reflection, and Reconstruction, allowing accurate memory retrieval, filtering, and reasoning for complex visual queries → read their blog
Glm-4.5 sets new standards for AI performance and accessibility
Researchers from Z.ai (ex-Zhipu) released GLM-4.5, a 355B-parameter open-source Mixture of Experts (MoE) model, alongside a 106B-parameter version (GLM-4.5-Air). Ranking 3rd globally and 1st among open-source models across 12 benchmarks, it integrates reasoning, coding, and agentic abilities. With generation speeds over 100 tokens/sec and pricing at $0.11/$0.28 per million input/output tokens, it supports on-premise deployment. Its agent-native architecture enables autonomous multi-step task planning and data visualization → read the press release
Qwen3-Coder: Agentic coding in the world
Researchers from the Qwen Team introduce Qwen3-Coder-480B-A35B-Instruct, a 480B Mixture-of-Experts model with 35B active parameters and native 256K token context (extendable to 1M), achieving state-of-the-art results on SWE-Bench Verified and agentic tasks. Trained on 7.5T tokens (70% code), it combines pretraining with cleaned synthetic data and post-training with large-scale Code RL and Agent RL. GR-3 supports seamless agentic coding via CLI tools like Qwen Code and Claude Code, and offers OpenAI-compatible API access via Dashscope → read their blog
Sapient hierarchical reasoning model
Researchers from Sapient Intelligence developed the Hierarchical Reasoning Model (HRM), a brain-inspired architecture with 27 million parameters trained on just 1,000 examples and no pre-training. HRM outperforms leading models on ARC-AGI-2 (5%), Sudoku-Extreme, and 30x30 Maze-Hard, where state-of-the-art LLMs fail. It uses dual recurrent networks with multi-timescale processing for fast and abstract reasoning. HRM also achieves 97% accuracy in S2S climate forecasting and is being tested in healthcare and robotics → read their blog
Yume: An interactive world generation model
Researchers from Shanghai AI Laboratory and Fudan University present Yume, a system that generates infinite, interactive video worlds from images using keyboard control. It employs quantized camera motions, a Masked Video Diffusion Transformer (MVDT), an anti-artifact mechanism (AAM), and a TTS-SDE sampler. Trained on the Sekai-Real-HQ dataset, Yume outperforms Wan-2.1 and MatrixGame in instruction-following (0.657→0.743), subject consistency (0.932), and smoothness (0.986), while enabling acceleration via adversarial distillation and caching → read the paper
Franca: Nested Matryoshka clustering for scalable visual representation learning
Researchers from Valeo.ai and UTN introduce Franca, the first fully open-source vision foundation model that matches or outperforms proprietary models like DINOv2 and CLIP. Trained on public datasets (ImageNet-21K and LAION-600M), Franca employs nested Matryoshka clustering and RASA to improve representation granularity and remove spatial bias. Without distillation or proprietary data, it achieves 86% ImageNet accuracy, surpasses DINOv2-G in robustness, OOD detection, and 3D understanding, and excels in segmentation and overclustering tasks. Franca's training code, checkpoints, and data are fully public → read the paper
Gr-3: A vision-language-action model for general robot control
Researchers from ByteDance present GR-3, a 4B-parameter vision-language-action (VLA) model controlling a bi-manual mobile robot. GR-3 combines imitation learning from 252 hours of robot trajectories, co-training with web-scale vision-language data, and few-shot learning from VR-collected human trajectories. GR-3 surpasses π₀ in pick-and-place (+37.1% on unseen instructions), table bussing (97.5% vs 53.8% success), and cloth manipulation (75.8% progress on unseen clothes). GR-3 uses flow-matching for action prediction, RMSNorm for stability, and task status prediction for better instruction adherence. ByteMini robot hardware enables robust, dexterous performance in real-world tasks → read the paper

The freshest research papers, categorized for your convenience

We organize research papers by goal-oriented or functional categories to make it easier to explore related developments and compare approaches. As always, papers we particularly recommend are marked with 🌟

Reinforcement Learning for Reasoning Optimization

🌟 Group Sequence Policy Optimization (by Qwen) improves RL stability and sample efficiency by operating on sequence-level importance ratios instead of token-level ones → read the paper
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR enhances RLVR by applying entropy-aware token constraints that treat factual and reasoning tokens differently → read the paper
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization learns to allocate reasoning effort based on problem complexity through length-aware RL optimization → read the paper
Hierarchical Budget Policy Optimization for Adaptive Reasoning uses a hierarchical framework to optimize token budgets based on task complexity while preserving exploration diversity → read the paper
🌟 RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback (by Chinese Academy of Science and Alibaba) trains critics via RL with dual refinement and correctness signals to improve long-form CoT evaluation and filtering → read the paper
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning investigates domain interplay in RLVR training, revealing how multi-domain reasoning performance generalizes and conflicts → read the paper
🌟 The Invisible Leash: Why RLVR May Not Escape Its Origin (by Stanford University, University of Tokyo, RIKEN AIP, University of Washington) questions RLVR’s ability to expand reasoning boundaries beyond what base models already support, revealing limits of exploration → read the paper

Long-Horizon and Test-Time Adaptive Reasoning

🌟 Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning (by MIT CSAIL, Subconscious Systems Technologies, Princeton University, Tel Aviv University) unlocks long-horizon reasoning by training models to operate on recursive tree-structured tasks with memory-aware inference → read the paper
🌟 MUR: Momentum Uncertainty guided Reasoning for Large Language Models dynamically adjusts compute at inference time based on stepwise uncertainty, reducing overthinking and improving accuracy → read the paper
🌟 Inverse Scaling in Test-Time Compute (by Chinese Academy of Science and Alibaba) reveals failure cases where increasing test-time reasoning harms performance, especially with distractors or constraints → read the paper
🌟Does More Inference-Time Compute Really Help Robustness? (by Princeton University, NVIDIA, Carnegie Mellon University, Google DeepMind) shows that inference scaling improves robustness only under limited adversarial assumptions and may worsen it otherwise → read the paper

Multimodal, Embodied, and GUI-Centric Reasoning

GUI-G2: Gaussian Reward Modeling for GUI Grounding reframes GUI grounding from binary hit/miss to a Gaussian reward landscape, enabling more precise spatial reasoning → read the paper
🌟 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning (by NVIDIA) bridges planning and action by using visual latent reasoning plans reinforced by action feedback in embodied agents → read the paper
Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory grounds VLMs in robotics by using self-curated experience memory to improve performance on real-world tasks → read the paper
Pixels, Patterns, but No Poetry: To See The World like Humans introduces the Turing Eye Test to evaluate whether MLLMs actually perceive visuals the way humans do → read the paper
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models enables low-latency spoken reasoning by interleaving unspoken CoT generation with spoken response chunks → read the paper

Datasets and Tools for Scientific & Web Reasoning

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning curates a massive scientific reasoning corpus and evaluation suite to enhance performance across diverse STEM fields → read the paper
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization creates synthetic web search reasoning data using set-theoretic task formalization and recursive agent expansion → read the paper

Specialized Adaptation and Mitigation

Mitigating Object Hallucinations via Sentence-Level Early Intervention reduces hallucinations in MLLMs by training sentence-level preference models that detect and intervene early in generation → read the paper
🌟 DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts (by CeADAR, University College Dublin) handles data stream adaptation by using an MoE router and incremental tree experts co-trained in an online feedback loop → read the paper
🌟A New Pair of GloVes (by Stanford University) updates and evaluates modern GloVe vectors trained on new corpora with precise documentation for improved NER and word similarity → read the paper

That’s all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

How was today's FOD?

Please give us some constructive feedback

Reply

or to participate.