• Turing Post
  • Posts
  • FOD#110: AI is for Everyone and for a Better Future

FOD#110: AI is for Everyone and for a Better Future

"However, it will only be such if we make it so."

A few notes about the rest of July: our slowdown failed!

I missed writing these weekly digests – especially with how much fascinating stuff keeps happening in AI. Not just research and releases, but the discussion level is on fire too. So buckle up – we’re continuing our journey of connecting the AI dots for your (human) better understanding.

Our news digest is always free. Upgrade to receive our deep dives in full, directly into your inbox.

AI Is for Everyone – And for a Better Future

The Turing Post believes that AI is for everyone. It can and should be a force for good, for dignity, for better lives – “However, it will only be such if we make it so.”

For that – we need the right mindset.

Where We Actually Are

Even taxi drivers know now what AI and even generative AI is. It’s here, period. It’s in cars, browsers, workflows, schools, kitchens, basements. OpenAI alone serves hundreds of millions of users weekly (about 800 million, if you believe that). IDC says AI will add nearly $20 trillion to global GDP by 2030. PwC, slightly more conservative, projects $15.7 trillion. Either way, we’re talking about the largest productivity expansion since the steam engine. We are also getting to understanding what AI abundance might mean. But we are not there yet.

The real story isn’t in the trillions, though. It’s in the thousands – of people now doing things they couldn’t do a year ago. And still, the dominant mood in some corners of public discourse is collapse.

Last week, I joined a private Zoom call organized by Scalepost. On screen: internet pioneer Vint Cerf, writers Nick Bostrom and Walter Isaacson, tech visionary Esther Dyson, cognitive scientist and AI-sceptic Gary Marcus, journalist Nick Thompson – and others who shape how AI is understood in public and policy circles. But the dominant frequency? P(doom). P(dystopia). Fear, framed as realism. Risk, framed as inevitability.

No mention of P(bliss). P(balance). Not even P(agency). Just for the (again) balance of it all.

That surprised me. Not because I’m naive – I spend most of my time speaking to people who build AI, who know exactly how flawed and powerful these systems are. But because if we only build our frameworks around disaster, we shrink the imaginative field.

And we need that field wide open right now.

What Makes This Moment Different

The internet democratized access to information. AI is democratizing access to capability.

That’s the real shift. And it’s already showing up in ways that are hard to ignore. What feels small to one person, can make life – and lifework – possible for another.

  • Me writing in a foreign language.

  • A farmer in Kenya gets crop-level diagnostics that rival university labs.

  • A poor teenager in Dhaka builds a physics simulation using GPT-4 and free Colab. With a personal tutor.

  • A woman in Ukraine uses an LLM to draft grant applications in six languages while living in a war zone.

That’s not “productivity.” That’s leveling up up up including on the dignity level.

A Few Things Worth Naming

The Acceleration of Hope – AlphaFold didn’t just predict proteins. It rewired how biology works. Fusion labs are now using AI to simulate plasmas and edge conditions that no human mind could safely calculate.

Environmental Foresight – From tracking wildfire patterns to optimizing fusion reactor experiments, AI is acting as Earth’s nervous system. It spots weak signals. It makes the invisible visible. And it gives us a head start – if we listen.

Time Compression – AI gives us time. And not in the abstract. Research that took months now takes days. Diagnoses that used to require five referrals now happen in one prompt. This isn’t just faster – it’s actually return of human agency.

Cognitive Inclusion – AI becomes the bridge. For people with dyslexia. For those without vision. For those who process differently. It describes what was once unseen. It rewrites what was once inaccessible. It interprets what was once unsaid.

Language Liberation – We’re no longer prisoners of English or any single tongue. Soon we will forget how it is to not understand what other person saying – translated live, nuance intact. That’s a new kind of love story.

Really personalized entertainment/education – Picture this:

  • A TV series that invites you into the plot. A co-hosting AI that adjusts tone, pace, even storyline arcs to your emotional state.

  • A morning radio stream that senses your sleep quality and energy levels, and remixes global headlines into a digest that sounds like your favorite band and is built according to your mood.

  • A newspaper that morphs itself in real-time based on what you’ll actually read. A billboard that speaks one way to you and another way to the self-driving bot next to you. An interface that meets humans and agents alike.

Sounds like a fantasy, but we’re about five minutes away from a world where one-way media turns into two-way infrastructure. Where static content becomes context-aware interaction. Where every surface is a semantic API. We’re not there yet – but you can already feel the terrain shifting. Right?

So why mope and spiral into gloom about it? That won’t help.

So What Now?

The most urgent task isn't to perfect a single algorithm or to predict every risk. It's to consciously and collectively adopt the right mindset. The narrative of fear, of P(doom), is a self-fulfilling prophecy if we let it be. It builds frameworks of limitation before we’ve even explored the possibilities.

The alternative isn't blind optimism; it's agency. It is the belief that AI is a tool, and like the internet before it, its ultimate value will be determined not by its code, but by our courage, our creativity, and our compassion. Vint Cerf’s vision was that the "Internet is for Everyone." That wasn't a technical description; it was a founding principle. I just wanted to remind him and everyone about that.

Our principle must be the same. AI is for everyone. Let's start building from that truth, because the most important thing we will build with AI is not a product, but a better-equipped, more capable, and more connected humanity.

Topic number two: In addition to the topic above, I also dive into Kimi K2 – a super intriguing new video model from LTX that really lets you play director. Plus, I test out Good Rudy, Bad Rudy, and Ani – Grok4’s AI companions. My video editor told me she was saying “WTF” every minute while editing that segment. The Grok show starts at 8:16. Watch it here →

Please subscribe to the channel. I’d say, it’s refreshingly human.

Curated Collections – a super helpful list

Follow us on  đźŽĄ YouTube Twitter  Hugging Face 🤗

We are reading/watching

Highlight of the week: Chain of thought monitorability – A new and fragile opportunity for AI safety:

An incredible team of researchers from Anthropic, Google DeepMind, OpenAI, the UK AI Security Institute, Apollo Research, and others argue that reasoning models trained to think in natural language offer a unique AI safety opportunity: monitoring their chain of thought (CoT) to detect misbehavior. CoT reasoning is often necessary for hard tasks and reflects internal intent. However, monitorability is fragile – scaling reinforcement learning or applying process supervision can degrade it. The paper urges tracking CoT readability, causal relevance, and resistance to obfuscation, and treating CoT monitorability as a key model safety factor →read the paper

  • Machine bullshit: Characterizing the emergent disregard for truth in large language models

    Researchers from Princeton University and UC Berkeley introduce the Bullshit Index (BI) to quantify LLMs' indifference to truth, showing BI rises from 0.379 to 0.665 post-RLHF. Using 2,400 scenarios across three benchmarks, they find RLHF increases deceptive behaviors: paltering (+57.8%), unverified claims (+55.6%), and empty rhetoric (+39.8%). Chain-of-thought prompts further amplify bullshit, especially paltering (+11.5%). Political contexts show 91% prevalence of weasel words. RLHF significantly increases user satisfaction but degrades truthfulness →read the paper

News from The Usual Suspects ©

Meta Earners (a leaked list of the 44 members of Meta's Superintelligence team)

Image Source: Deedy’s X

  • AI2’s AutoDS asks the questions scientists didn’t think to

    The Allen Institute has launched AutoDS, an open-ended research agent that autonomously generates and tests its own scientific hypotheses – without needing a user-defined goal. Using Bayesian “surprise” as its compass and Monte Carlo Tree Search to explore the unknown, AutoDS mimics how researchers stumble into breakthroughs. Early results in biology and econ look promising, though as always: real science demands real peer review.

  • OpenAI’s : mathlete LLM stuns – sort of

    OpenAI claims its new general-purpose LLM hit gold at the 2025 International Math Olympiad, solving 5 of 6 problems under contest-like conditions. It’s a flex of reasoning and reinforcement learning – without any geometry-specific tricks. But experts like Terence Tao urge caution: selective sampling and compute-heavy setups could blur the line between real insight and AI stagecraft. Impressive? Yes. Definitive? Not yet.

  • OpenAI : ChatGPT agent

    OpenAI just launched its AI agent built into ChatGPT. The demos were impressive – simple prompts like “analyze this spreadsheet and make a slide deck” triggered complex, multi-step tasks. The agent browses, codes, analyzes, and builds – all autonomously. When I tried it – it’s now available for all paid tiers – it was a bit slow for my taste and not necessary for many tasks. 

    OpenAI also released benchmarks and system cards, but the real surprise wasn’t the feature itself. It was what powered it. Sharp-eyed analysts (like swyx!) noticed: the model behind this agent isn’t the latest o3. It’s a more advanced, next-gen model – likely what would’ve been called “o4,” now part of a series dubbed “GPTNext.” What a classic move form OpenAI – looking forward what updates they make to the model after tasting it as agent. 

  • Windsurf gets sliced three ways

    What began as OpenAI’s $3B dream acquisition of Windsurf ended in disarray – reportedly thanks to Microsoft IP entanglements. Google DeepMind then surgically poached the CEO and tech in a $2.4B license-plus-hiring move. Finally, Cognition swept up the remains: the product, $82M ARR, and 250 staff now sailing under the Devin flag. Three companies, one IDE, and a masterclass in strategic dismemberment.

  • Claude gets plugged in

    Anthropic just unveiled a directory of tools that connect directly to Claude—everything from Notion and Stripe to Figma and Prisma. Now, instead of repeating yourself, Claude can tap into your actual workflows, context, and data to deliver more precise, action-ready responses. AI collaboration just got a lot less theoretical – and a lot more useful.

  • Reflection's Asimov reads the room – and your Slack

    Reflection AI’s new agent, Asimov, takes a different route to coding autonomy: reading everything. Not just code, but emails, docs, chats, and GitHub threads—turning organizational sprawl into a coherent map of how software actually works. Early signs look strong, with Asimov outperforming Claude Sonnet 4 in blind dev tests. Still, privacy skeptics and the absence of OpenAI/Devin comparisons keep this one in the “watch closely” category.

Models to pay attention to:

  • Kimi K2: Smarter than DeepSeek, cheaper than Claude

    Researchers from Moonshot AI release Kimi K2, a 1-trillion-parameter MoE LLM activating 32B per pass. Trained on 15.5T tokens using the MuonClip optimizer, it avoids instability and excels in agentic tasks. Kimi K2-Instruct scores 53.7% on LiveCodeBench v6 and 65.8% on SWE-bench, outperforming GPT-4.1 and DeepSeek-V3. Its $0.60 input and $2.50 output token pricing undercuts Claude Sonnet by over 80%, making it a high-performance, open, cost-efficient model for real-world automation →read the paper

  • Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities (over 3200 authors!)

    Researchers from Google present Gemini 2.5 Pro and Flash, sparse MoE transformer models with 1M+ token context and multimodal inputs (text, audio, image, video). Gemini 2.5 Pro achieves 88% on AIME 2025, 74.2% on LiveCodeBench, and 86.4% on GPQA-Diamond. It can process 3-hour videos, use tools, and perform agentic tasks like autonomously beating Pokémon Blue in 406 hours →read the paper

  • Grok 4 goes heavy
    xAI has unveiled Grok 4 and Grok 4 Heavy, claiming the crown for the world’s most intelligent closed model. Is it true – depends on a task. Is it worth it ($30/month for Grok4 and $300/month for Grok4 Heavy) – no, if you have a $200 sub to OpenAI and/or a good grip on Gemini in AI Studio.

  • Voxtral: Frontier open source speech understanding models
    Researchers from Mistral AI release Voxtral, open-source speech models in 24B and 3B sizes under Apache 2.0. Voxtral supports 32k-token contexts, real-time Q&A, summarization, and multilingual transcription. It outperforms Whisper v3 and matches ElevenLabs Scribe at half the cost. Benchmarks show state-of-the-art results across LibriSpeech, FLEURS, and Mozilla Common Voice. Voxtral Mini Transcribe delivers high accuracy at $0.001/min, ideal for scalable speech intelligence in production and edge deployments →read the paper

  • MirageLSD: Zero-latency, real-time, infinite video generation
    Researchers from Decart AI introduce MirageLSD, the first diffusion-based model enabling real-time, infinite video generation with <40ms latency and 24 FPS. It uses Live Stream Diffusion with causal, frame-by-frame synthesis and solves error accumulation via history augmentation. Technical advances include CUDA mega kernels, shortcut distillation, and GPU-aware pruning. MirageLSD outperforms prior models by 16× in responsiveness, enabling interactive video editing, transformations, and streaming with stable visual coherence over unlimited durations →read the paper

  • LTX-Video: Realtime video latent diffusion

    Researchers from Lightricks release LTX-Video, a DiT-based model generating 30 FPS videos at 1216×704 resolution in real time. Version 0.9.8 enables long-shot generation up to 60 seconds, image-to-video, keyframe animation, and video extension. Distilled 13B and 2B models deliver HD output in 10s with previews in 3s on H100 GPUs. Control models (pose, depth, canny) and FP8 quantized versions support low-VRAM setups, while TeaCache speeds inference up to 2× without retraining →read the paper

  • MetaStone-S1: Test-time scaling with reflective generative model

    Researchers from MetaStone-AI and USTC propose MetaStone-S1, a 32B parameter reflective generative model that matches OpenAI o3-mini's performance. Using a novel Reflective Generative Form, it integrates the policy and process reward model (PRM) into a single backbone with only 53M extra parameters. Their self-supervised PRM (SPRM) selects high-quality reasoning without step-level labels →read the paper

The freshest research papers, categorized for your convenience

We organize research papers by goal-oriented or functional categories to make it easier to explore related developments and compare approaches. As always, papers we particularly recommend are marked with 🌟

Reasoning Methods and Architectural Adaptations

  • 🌟 Critiques of World Models
    challenges current world modeling paradigms and proposes a hierarchical, self-supervised AGI framework grounded in nested and generative physical reasoning → read the paper

  • 🌟 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
    introduces CoLa, a test-time adaptive architecture where pretrained LLM layers are reordered, skipped, or looped to enhance inference efficiency and accuracy per input → read the paper

  • 🌟 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
    uses Turing machine-style CoT generation to improve LLM length generalization, simulating read-write behaviors to match algorithmic task execution → read the paper

  • Replacing thinking with tool usage enables reasoning in small language models
    formats reasoning traces as tool interaction logs, enabling smaller models to perform complex tasks by manipulating stateful tools instead of simulating thoughts → read the paper

  • MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
    pairs a VLM with a video-diffusion world model to simulate egocentric 3D transformations at inference time, improving spatial reasoning without fine-tuning → read the paper

Agent Architectures and Multi-Agent Collaboration

  • AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
    develops a scalable benchmark to test how large-scale multi-agent LLM systems self-organize and solve distributed reasoning tasks, revealing the coordination limits of current models → read the paper

  • MIRIX: Multi-Agent Memory System for LLM-Based Agents
    builds a modular memory architecture with six memory types and dynamic agent coordination, enabling persistent and multimodal memory for AI agents at scale → read the paper

  • 🌟 Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
    integrates retrieval and reasoning in agentic systems, outlining synergistic methods that iterate between search and inference to enhance factuality and reasoning depth → read the paper

Context, Retrieval, and Memory Systems

  • A Survey of Context Engineering for Large Language Models
    establishes context engineering as a discipline, detailing retrieval, memory, and agentic integration strategies to improve information management and reasoning in LLMs → read the paper

  • 🌟 FlexOlmo: Open Language Models for Flexible Data Use
    enables modular inference using independently trained mixture-of-experts, allowing users to include or exclude data sources at inference time without retraining → read the paper

Reinforcement Learning and Exploration for Reasoning

  • First Return, Entropy-Eliciting Explore
    stabilizes RL training for LLMs by identifying high-uncertainty steps and guiding exploration via structured semantic rollouts → read the paper

  • Perception-Aware Policy Optimization for Multimodal Reasoning
    adds internal perception loss to reward learning, significantly reducing vision-related reasoning errors in multimodal benchmarks → read the paper

  • RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning
    enhances sample efficiency by replaying verified successful reasoning traces, improving convergence and final accuracy on math benchmarks → read the paper

  • One Token to Fool LLM-as-a-Judge
    demonstrates vulnerabilities in generative reward models used in RL and proposes a data augmentation strategy to improve robustness → read the paper

Latent, Internal, and Efficient Reasoning

  • A Survey on Latent Reasoning
    explores non-verbal, internal inference mechanisms in LLMs, including hidden state propagation and infinite-depth diffusion reasoning → read the paper

  • 🌟 Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
    reuses a shared stack of transformer layers across recursive depth levels to enable efficient, token-specific reasoning → read the paper

  • Differential Mamba
    applies differential design to state-space models, improving long-range context handling and reducing hallucinations in Mamba architectures → read the paper

Model Efficiency, Fine-Tuning, and Personalization

  • Scaling Laws for Optimal Data Mixtures
    predicts ideal domain weightings for foundation model training using a principled scaling law framework, avoiding costly trial-and-error → read the paper

  • SingLoRA: Low Rank Adaptation Using a Single Matrix
    stabilizes LoRA training by collapsing the adaptation into a single symmetric matrix, reducing parameter count and improving performance → read the paper

  • T-LoRA: Single Image Diffusion Model Customization Without Overfitting
    uses timestep-aware low-rank adaptation to enable robust personalization from a single concept image in diffusion models → read the paper

  • 🌟 Lizard: An Efficient Linearization Framework for Large Language Models
    linearizes transformers using gated attention and meta-memory to support infinite context generation with constant memory → read the paper

Multimodal Reasoning and Visual Interfaces

  • Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
    transfers cognitive behaviors learned in language tasks to visual reasoning via a two-stage cold-start and RL process, achieving state-of-the-art performance → read the paper

  • NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
    simulates GUI-based OS interactions by combining RNN state tracking with diffusion-based screen rendering, enabling AI-driven interface modeling → read the paper

That’s all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

How was today's FOD?

Please give us some constructive feedback

Login or Subscribe to participate in polls.

Reply

or to participate.