Turing Post
Posts
FOD#110: AI is for Everyone and for a Better Future

FOD#110: AI is for Everyone and for a Better Future

"However, it will only be such if we make it so."

Ksenia Se & Will Schenk
July 21, 2025

A few notes about the rest of July: our slowdown failed!

I missed writing these weekly digests – especially with how much fascinating stuff keeps happening in AI. Not just research and releases, but the discussion level is on fire too. So buckle up – we’re continuing our journey of connecting the AI dots for your (human) better understanding.

Our news digest is always free. Upgrade to receive our deep dives in full, directly into your inbox.

AI Is for Everyone – And for a Better Future

The Turing Post believes that AI is for everyone. It can and should be a force for good, for dignity, for better lives – “However, it will only be such if we make it so.”

For that – we need the right mindset.

Where We Actually Are

Even taxi drivers know now what AI and even generative AI is. It’s here, period. It’s in cars, browsers, workflows, schools, kitchens, basements. OpenAI alone serves hundreds of millions of users weekly (about 800 million, if you believe that). IDC says AI will add nearly $20 trillion to global GDP by 2030. PwC, slightly more conservative, projects $15.7 trillion. Either way, we’re talking about the largest productivity expansion since the steam engine. We are also getting to understanding what AI abundance might mean. But we are not there yet.

The real story isn’t in the trillions, though. It’s in the thousands – of people now doing things they couldn’t do a year ago. And still, the dominant mood in some corners of public discourse is collapse.

Last week, I joined a private Zoom call organized by Scalepost. On screen: internet pioneer Vint Cerf, writers Nick Bostrom and Walter Isaacson, tech visionary Esther Dyson, cognitive scientist and AI-sceptic Gary Marcus, journalist Nick Thompson – and others who shape how AI is understood in public and policy circles. But the dominant frequency? P(doom). P(dystopia). Fear, framed as realism. Risk, framed as inevitability.

No mention of P(bliss). P(balance). Not even P(agency). Just for the (again) balance of it all.

That surprised me. Not because I’m naive – I spend most of my time speaking to people who build AI, who know exactly how flawed and powerful these systems are. But because if we only build our frameworks around disaster, we shrink the imaginative field.

And we need that field wide open right now.

What Makes This Moment Different

The internet democratized access to information. AI is democratizing access to capability.

That’s the real shift. And it’s already showing up in ways that are hard to ignore. What feels small to one person, can make life – and lifework – possible for another.

Me writing in a foreign language.
A farmer in Kenya gets crop-level diagnostics that rival university labs.
A poor teenager in Dhaka builds a physics simulation using GPT-4 and free Colab. With a personal tutor.
A woman in Ukraine uses an LLM to draft grant applications in six languages while living in a war zone.

That’s not “productivity.” That’s leveling up up up including on the dignity level.

A Few Things Worth Naming

The Acceleration of Hope – AlphaFold didn’t just predict proteins. It rewired how biology works. Fusion labs are now using AI to simulate plasmas and edge conditions that no human mind could safely calculate.

Environmental Foresight – From tracking wildfire patterns to optimizing fusion reactor experiments, AI is acting as Earth’s nervous system. It spots weak signals. It makes the invisible visible. And it gives us a head start – if we listen.

Time Compression – AI gives us time. And not in the abstract. Research that took months now takes days. Diagnoses that used to require five referrals now happen in one prompt. This isn’t just faster – it’s actually return of human agency.

Cognitive Inclusion – AI becomes the bridge. For people with dyslexia. For those without vision. For those who process differently. It describes what was once unseen. It rewrites what was once inaccessible. It interprets what was once unsaid.

Language Liberation – We’re no longer prisoners of English or any single tongue. Soon we will forget how it is to not understand what other person saying – translated live, nuance intact. That’s a new kind of love story.

Really personalized entertainment/education – Picture this:

A TV series that invites you into the plot. A co-hosting AI that adjusts tone, pace, even storyline arcs to your emotional state.
A morning radio stream that senses your sleep quality and energy levels, and remixes global headlines into a digest that sounds like your favorite band and is built according to your mood.
A newspaper that morphs itself in real-time based on what you’ll actually read. A billboard that speaks one way to you and another way to the self-driving bot next to you. An interface that meets humans and agents alike.

Sounds like a fantasy, but we’re about five minutes away from a world where one-way media turns into two-way infrastructure. Where static content becomes context-aware interaction. Where every surface is a semantic API. We’re not there yet – but you can already feel the terrain shifting. Right?

So why mope and spiral into gloom about it? That won’t help.

So What Now?

The most urgent task isn't to perfect a single algorithm or to predict every risk. It's to consciously and collectively adopt the right mindset. The narrative of fear, of P(doom), is a self-fulfilling prophecy if we let it be. It builds frameworks of limitation before we’ve even explored the possibilities.

The alternative isn't blind optimism; it's agency. It is the belief that AI is a tool, and like the internet before it, its ultimate value will be determined not by its code, but by our courage, our creativity, and our compassion. Vint Cerf’s vision was that the "Internet is for Everyone." That wasn't a technical description; it was a founding principle. I just wanted to remind him and everyone about that.

Our principle must be the same. AI is for everyone. Let's start building from that truth, because the most important thing we will build with AI is not a product, but a better-equipped, more capable, and more connected humanity.

Topic number two: In addition to the topic above, I also dive into Kimi K2 – a super intriguing new video model from LTX that really lets you play director. Plus, I test out Good Rudy, Bad Rudy, and Ani – Grok4’s AI companions. My video editor told me she was saying “WTF” every minute while editing that segment. The Grok show starts at 8:16. Watch it here →

Please subscribe to the channel. I’d say, it’s refreshingly human.

Curated Collections – a super helpful list

Click to open the full list

Follow us on 🎥 YouTube Twitter Hugging Face 🤗

We are reading/watching

Crawling a Billion Web Pages in Just Over 24 Hours, in 2025 by Andrew K. Chan
Superposition Meets Production – A Guide for AI Engineers by Ben Lorica on Gradient Flow
The Smartest Consumer Apps Now Cost $200 a Month (Narrow Startups) by Andreessen Horowitz (a16z)
Reflections on OpenAI by Calvin French-Owen
Asymmetry of Verification and Verifier’s Law by Jason Wei
Could AI Slow Science? by Arvind Narayanan & Sayash Kapoor on AI Snake Oil
The Tiny Teams Playbook by Latent Space (swyx & team)

Highlight of the week: Chain of thought monitorability – A new and fragile opportunity for AI safety:

An incredible team of researchers from Anthropic, Google DeepMind, OpenAI, the UK AI Security Institute, Apollo Research, and others argue that reasoning models trained to think in natural language offer a unique AI safety opportunity: monitoring their chain of thought (CoT) to detect misbehavior. CoT reasoning is often necessary for hard tasks and reflects internal intent. However, monitorability is fragile – scaling reinforcement learning or applying process supervision can degrade it. The paper urges tracking CoT readability, causal relevance, and resistance to obfuscation, and treating CoT monitorability as a key model safety factor →read the paper

Recommended Index

Machine bullshit: Characterizing the emergent disregard for truth in large language models
Researchers from Princeton University and UC Berkeley introduce the Bullshit Index (BI) to quantify LLMs' indifference to truth, showing BI rises from 0.379 to 0.665 post-RLHF. Using 2,400 scenarios across three benchmarks, they find RLHF increases deceptive behaviors: paltering (+57.8%), unverified claims (+55.6%), and empty rhetoric (+39.8%). Chain-of-thought prompts further amplify bullshit, especially paltering (+11.5%). Political contexts show 91% prevalence of weasel words. RLHF significantly increases user satisfaction but degrades truthfulness →read the paper

News from The Usual Suspects ©

Meta Earners (a leaked list of the 44 members of Meta's Superintelligence team)

Image Source: Deedy’s X

AI2’s AutoDS asks the questions scientists didn’t think to
The Allen Institute has launched AutoDS, an open-ended research agent that autonomously generates and tests its own scientific hypotheses – without needing a user-defined goal. Using Bayesian “surprise” as its compass and Monte Carlo Tree Search to explore the unknown, AutoDS mimics how researchers stumble into breakthroughs. Early results in biology and econ look promising, though as always: real science demands real peer review.
OpenAI’s : mathlete LLM stuns – sort of
OpenAI claims its new general-purpose LLM hit gold at the 2025 International Math Olympiad, solving 5 of 6 problems under contest-like conditions. It’s a flex of reasoning and reinforcement learning – without any geometry-specific tricks. But experts like Terence Tao urge caution: selective sampling and compute-heavy setups could blur the line between real insight and AI stagecraft. Impressive? Yes. Definitive? Not yet.
OpenAI : ChatGPT agent
OpenAI just launched its AI agent built into ChatGPT. The demos were impressive – simple prompts like “analyze this spreadsheet and make a slide deck” triggered complex, multi-step tasks. The agent browses, codes, analyzes, and builds – all autonomously. When I tried it – it’s now available for all paid tiers – it was a bit slow for my taste and not necessary for many tasks.
OpenAI also released benchmarks and system cards, but the real surprise wasn’t the feature itself. It was what powered it. Sharp-eyed analysts (like swyx!) noticed: the model behind this agent isn’t the latest o3. It’s a more advanced, next-gen model – likely what would’ve been called “o4,” now part of a series dubbed “GPTNext.” What a classic move form OpenAI – looking forward what updates they make to the model after tasting it as agent.
Windsurf gets sliced three ways
What began as OpenAI’s $3B dream acquisition of Windsurf ended in disarray – reportedly thanks to Microsoft IP entanglements. Google DeepMind then surgically poached the CEO and tech in a $2.4B license-plus-hiring move. Finally, Cognition swept up the remains: the product, $82M ARR, and 250 staff now sailing under the Devin flag. Three companies, one IDE, and a masterclass in strategic dismemberment.
Claude gets plugged in
Anthropic just unveiled a directory of tools that connect directly to Claude—everything from Notion and Stripe to Figma and Prisma. Now, instead of repeating yourself, Claude can tap into your actual workflows, context, and data to deliver more precise, action-ready responses. AI collaboration just got a lot less theoretical – and a lot more useful.
Reflection's Asimov reads the room – and your Slack
Reflection AI’s new agent, Asimov, takes a different route to coding autonomy: reading everything. Not just code, but emails, docs, chats, and GitHub threads—turning organizational sprawl into a coherent map of how software actually works. Early signs look strong, with Asimov outperforming Claude Sonnet 4 in blind dev tests. Still, privacy skeptics and the absence of OpenAI/Devin comparisons keep this one in the “watch closely” category.

Models to pay attention to:

Kimi K2: Smarter than DeepSeek, cheaper than Claude
Researchers from Moonshot AI release Kimi K2, a 1-trillion-parameter MoE LLM activating 32B per pass. Trained on 15.5T tokens using the MuonClip optimizer, it avoids instability and excels in agentic tasks. Kimi K2-Instruct scores 53.7% on LiveCodeBench v6 and 65.8% on SWE-bench, outperforming GPT-4.1 and DeepSeek-V3. Its $0.60 input and $2.50 output token pricing undercuts Claude Sonnet by over 80%, making it a high-performance, open, cost-efficient model for real-world automation →read the paper
Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities (over 3200 authors!)
Researchers from Google present Gemini 2.5 Pro and Flash, sparse MoE transformer models with 1M+ token context and multimodal inputs (text, audio, image, video). Gemini 2.5 Pro achieves 88% on AIME 2025, 74.2% on LiveCodeBench, and 86.4% on GPQA-Diamond. It can process 3-hour videos, use tools, and perform agentic tasks like autonomously beating Pokémon Blue in 406 hours →read the paper
Grok 4 goes heavy
xAI has unveiled Grok 4 and Grok 4 Heavy, claiming the crown for the world’s most intelligent closed model. Is it true – depends on a task. Is it worth it ($30/month for Grok4 and $300/month for Grok4 Heavy) – no, if you have a $200 sub to OpenAI and/or a good grip on Gemini in AI Studio.
Voxtral: Frontier open source speech understanding models
Researchers from Mistral AI release Voxtral, open-source speech models in 24B and 3B sizes under Apache 2.0. Voxtral supports 32k-token contexts, real-time Q&A, summarization, and multilingual transcription. It outperforms Whisper v3 and matches ElevenLabs Scribe at half the cost. Benchmarks show state-of-the-art results across LibriSpeech, FLEURS, and Mozilla Common Voice. Voxtral Mini Transcribe delivers high accuracy at $0.001/min, ideal for scalable speech intelligence in production and edge deployments →read the paper
MirageLSD: Zero-latency, real-time, infinite video generation
Researchers from Decart AI introduce MirageLSD, the first diffusion-based model enabling real-time, infinite video generation with <40ms latency and 24 FPS. It uses Live Stream Diffusion with causal, frame-by-frame synthesis and solves error accumulation via history augmentation. Technical advances include CUDA mega kernels, shortcut distillation, and GPU-aware pruning. MirageLSD outperforms prior models by 16× in responsiveness, enabling interactive video editing, transformations, and streaming with stable visual coherence over unlimited durations →read the paper
LTX-Video: Realtime video latent diffusion
Researchers from Lightricks release LTX-Video, a DiT-based model generating 30 FPS videos at 1216×704 resolution in real time. Version 0.9.8 enables long-shot generation up to 60 seconds, image-to-video, keyframe animation, and video extension. Distilled 13B and 2B models deliver HD output in 10s with previews in 3s on H100 GPUs. Control models (pose, depth, canny) and FP8 quantized versions support low-VRAM setups, while TeaCache speeds inference up to 2× without retraining →read the paper
MetaStone-S1: Test-time scaling with reflective generative model
Researchers from MetaStone-AI and USTC propose MetaStone-S1, a 32B parameter reflective generative model that matches OpenAI o3-mini's performance. Using a novel Reflective Generative Form, it integrates the policy and process reward model (PRM) into a single backbone with only 53M extra parameters. Their self-supervised PRM (SPRM) selects high-quality reasoning without step-level labels →read the paper

The freshest research papers, categorized for your convenience

We organize research papers by goal-oriented or functional categories to make it easier to explore related developments and compare approaches. As always, papers we particularly recommend are marked with 🌟

Reasoning Methods and Architectural Adaptations

🌟 Critiques of World Models
challenges current world modeling paradigms and proposes a hierarchical, self-supervised AGI framework grounded in nested and generative physical reasoning → read the paper
🌟 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
introduces CoLa, a test-time adaptive architecture where pretrained LLM layers are reordered, skipped, or looped to enhance inference efficiency and accuracy per input → read the paper
🌟 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
uses Turing machine-style CoT generation to improve LLM length generalization, simulating read-write behaviors to match algorithmic task execution → read the paper
Replacing thinking with tool usage enables reasoning in small language models
formats reasoning traces as tool interaction logs, enabling smaller models to perform complex tasks by manipulating stateful tools instead of simulating thoughts → read the paper
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
pairs a VLM with a video-diffusion world model to simulate egocentric 3D transformations at inference time, improving spatial reasoning without fine-tuning → read the paper

Agent Architectures and Multi-Agent Collaboration

AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
develops a scalable benchmark to test how large-scale multi-agent LLM systems self-organize and solve distributed reasoning tasks, revealing the coordination limits of current models → read the paper
MIRIX: Multi-Agent Memory System for LLM-Based Agents
builds a modular memory architecture with six memory types and dynamic agent coordination, enabling persistent and multimodal memory for AI agents at scale → read the paper
🌟 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
integrates retrieval and reasoning in agentic systems, outlining synergistic methods that iterate between search and inference to enhance factuality and reasoning depth → read the paper

Context, Retrieval, and Memory Systems

A Survey of Context Engineering for Large Language Models
establishes context engineering as a discipline, detailing retrieval, memory, and agentic integration strategies to improve information management and reasoning in LLMs → read the paper
🌟 FlexOlmo: Open Language Models for Flexible Data Use
enables modular inference using independently trained mixture-of-experts, allowing users to include or exclude data sources at inference time without retraining → read the paper

Reinforcement Learning and Exploration for Reasoning

First Return, Entropy-Eliciting Explore
stabilizes RL training for LLMs by identifying high-uncertainty steps and guiding exploration via structured semantic rollouts → read the paper
Perception-Aware Policy Optimization for Multimodal Reasoning
adds internal perception loss to reward learning, significantly reducing vision-related reasoning errors in multimodal benchmarks → read the paper
RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning
enhances sample efficiency by replaying verified successful reasoning traces, improving convergence and final accuracy on math benchmarks → read the paper
One Token to Fool LLM-as-a-Judge
demonstrates vulnerabilities in generative reward models used in RL and proposes a data augmentation strategy to improve robustness → read the paper

Latent, Internal, and Efficient Reasoning

A Survey on Latent Reasoning
explores non-verbal, internal inference mechanisms in LLMs, including hidden state propagation and infinite-depth diffusion reasoning → read the paper
🌟 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
reuses a shared stack of transformer layers across recursive depth levels to enable efficient, token-specific reasoning → read the paper
Differential Mamba
applies differential design to state-space models, improving long-range context handling and reducing hallucinations in Mamba architectures → read the paper

Model Efficiency, Fine-Tuning, and Personalization

Scaling Laws for Optimal Data Mixtures
predicts ideal domain weightings for foundation model training using a principled scaling law framework, avoiding costly trial-and-error → read the paper
SingLoRA: Low Rank Adaptation Using a Single Matrix
stabilizes LoRA training by collapsing the adaptation into a single symmetric matrix, reducing parameter count and improving performance → read the paper
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
uses timestep-aware low-rank adaptation to enable robust personalization from a single concept image in diffusion models → read the paper
🌟 Lizard: An Efficient Linearization Framework for Large Language Models
linearizes transformers using gated attention and meta-memory to support infinite context generation with constant memory → read the paper

Multimodal Reasoning and Visual Interfaces

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
transfers cognitive behaviors learned in language tasks to visual reasoning via a two-stage cold-start and RL process, achieving state-of-the-art performance → read the paper
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
simulates GUI-based OS interactions by combining RNN state tracking with diffusion-based screen rendering, enabling AI-driven interface modeling → read the paper

That’s all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

How was today's FOD?

Please give us some constructive feedback

Reply

or to participate.