Turing Post
Posts
FOD#107: It's NOT the year of agents – it's the year when WE got used to AI

FOD#107: It's NOT the year of agents – it's the year when WE got used to AI

Plus 10 open-source deep research assistants and other useful stuff

Ksenia Se & Will Schenk
June 30, 2025

A few notes about July: it’s summer slowdown

Everyone needs a break – especially at the neck-breaking speed AI is moving. Many of you asked for roundups to catch up on the deeper topics we’ve covered. So in July, we’re pressing pause on the news cycle and diving into reflection.

We’ll wrap up our Agentic Workflows series (and introduce the next one), revisit core concepts, methods, and models, and reshare some of our most insightful interviews. You’ll also get curated collections of the best books and research papers we’ve featured – plus a look back at the bold predictions our readers made at the end of 2024. Time to check who was right :)

We hope you will be able slow down with us, and absorb some AI knowledge without the release-day rush.

Our news digest is always free. Upgrade to receive our deep dives in full, directly into your inbox. If you want to support us without getting a subscription – do it here.

Topic number one: 2025 is not the year of agents

So many conversations these days circle around agents and agentic workflows.
Agents are working. Actually, no – agents are still barely capable.
Are we vibe-coding with them, or co-coding with them? Everyone’s busy arguing whether we’re doing prompt engineering or context engineering. Hours are spent on that.

Image Credit: swyx and AI News

And what is our role in this loop – are we collaborators, or are we becoming callable tools for our AI? (Someone really needs to coin a term for that, like a "bot-y call", especially if it comes in the middle of the night).

The companies haven’t even agreed on what an agent is, just to name a few:

Image Credit: OpenAI

Image Credit: LangChain

Image Credit: Anthropic

Image Credit: Andrew Ng

But I think that’s not what matters. 2025 is not the year of agents, we are still figuring it out.

It sounds simple, what I’m about to say – but it marks an absolutely profound shift.

2025 is the year we got used to AI.

It’s the year the magic fades into the background, becoming so essential that we forget it was ever magic at all. It’s the difference when you forgetting the screeching song of your dial-up modem and start checking your Wi-Fi speed.

Topic number two: In this video, I also provide information from two reports that support my point. Links in the description section. Watch it here →

If you like it – please subscribe. I’d say, it’s refreshingly human.

Curated Collections – a super helpful list

Click to open the collection

Follow us on 🎥 YouTube Twitter Hugging Face 🤗

We are reading/watching

This is just such an awesome project, feeling grateful for this technology. Thank you, Google, and happy 20 years of Google Earth.

Image Credit: Google’s Blog

News from The Usual Suspects ©

The year of merging and acquisitions indeed:
@rubrikInc is acquiring @predibase
@salesforce "hiring" @MoonhubAI CEO Nancy Xu and a big chunk of the team
@AMD "hiring" @realSharonZhou and a bunch from her @LaminiAI.
And it's just the last couple of weeks
— Ksenia Se (@Kseniase_)
5:32 PM • Jun 25, 2025

Laude Institute bets big on research that ships
Launched with a $100M personal bet by Andy Konwinski (Apache Spark, Databricks), the Laude Institute aims to close the gap between lab research and real-world impact. A nonprofit with a public benefit arm, Laude is backing both early “Slingshots” like Terminal-Bench and long-horizon “Moonshots” for AI in science, society, and public goods. Board includes Dave Patterson, Jeff Dean, Joelle Pineau, and a who’s who of computing. Mission: accelerate responsible, applied breakthroughs – fast.
OpenAI feels the burn
After Meta lured away eight senior researchers – reportedly dangling sky-high bonuses – OpenAI’s leadership is scrambling to stop the talent bleed. Chief Research Officer Mark Chen called it a “visceral” loss, likening it to a break-in. OpenAI is now “recalibrating comp” and rethinking how to keep its stars from jumping ship. The company is also, apparently, taking a one-week company-wide shutdown to let burned-out staff recharge – except leadership, who are staying on to fend off Meta’s aggressive talent raid. Madness.
Runway hits ‘Play’ on gaming
After cutting its teeth in Hollywood, Runway is bringing its generative AI toolkit to the gaming world. The $3B startup is launching a consumer-facing product to generate games via simple chat prompts, with more advanced tools promised by year’s end. CEO Cristóbal Valenzuela says devs are embracing AI faster than filmmakers did – and no, he’s not selling to Zuckerberg (yet).
Google introduced so many things last week:
- Doppl, lets you virtually try on outfits using photos or screenshots. Built on Google Shopping’s AI try-on tech, Doppl animates your fashion choices with personalized videos. Don’t expect a perfect fit – but it is pretty accurate.
- Ask Photos and You Shall Retrieve – their AI-powered search tool lets users query their image libraries with natural language. Now it responds faster to simple prompts – like “dogs” – while still flexing its Gemini-powered muscles on deeper asks.
- Gemini CLI – an open-source AI agent that brings Gemini 2.5 Pro directly to developers’ terminals – free of charge (basically). With a 1M-token context window and unmatched usage limits (60 requests/min, 1,000/day), it’s a quiet power move. Some devs complain it doesn’t work properly though (yet).
- YouTube – Your Binge Buddy. It is turning search into more of a guided tour with AI-powered carousels – curated clips, creator insights, and smart summaries for queries like “best beaches in Hawaii.” Meanwhile, its conversational AI feature, once Premium-only, is expanding to more U.S. users.
- more in Models →

Models to pay attention to:

The ERNIE 4.5 series is now officially open source. This family of models includes 10 variants—from MoE models with 47B and 3B active parameters, the largest having 424B total parameters, to a 0.3B dense model—all available now to the global AI community for open research and
— Baidu Inc. (@Baidu_Inc)
4:36 PM • Jun 30, 2025

Introducing Gemma 3n: The developer guide
Researchers from Google introduced Gemma 3n, an on-device, multimodal LLM optimized for mobile hardware. It supports text, image, audio, and video inputs with outputs in text. Two models – E2B (2GB VRAM) and E4B (3GB VRAM) – achieve 2x-13x performance gains using MatFormer, PLE, and KV Cache Sharing. The E4B scores over 1300 in LMArena. Audio support includes speech-to-text/translation; MobileNet-V5 vision encoder runs 60 FPS on Pixel, outperforming SoViT with 46% fewer parameters. Open-sourced →read the paper
Hunyuan A13B
Researchers from Tencent introduced Hunyuan A13B, a 13-billion parameter LLM trained on 3.2 trillion tokens. It achieves 82.7% on MMLU, 73.5% on CMMLU, 36.6% on GSM8K, and 61.7% on HumanEval. The model outperforms LLaMA 3 8B and Qwen 1.5 14B on 10 of 13 benchmarks. It supports 27 programming languages, multilingual instruction following, and demonstrates strong reasoning and code synthesis. Hunyuan A13B is open-sourced under a permissive license →read the report
Lfm-1b-math: Can small models be concise reasoners?
Researchers from Liquid AI enhanced a 1.3B parameter chat model, LFM-1.3B, into LFM-1.3B-Math using 4.5M supervised samples and RL. Fine-tuning improved performance from 14.08% to 59.87% on reasoning benchmarks, while GRPO-based RL reduced verbosity and raised constrained (4k token) accuracy from 30.09% to 46.98%. Despite no math-specific pretraining, the model matched or outperformed specialized peers, making it ideal for edge deployment with tight compute and memory budgets →read the paper
AlphaGenome: AI for better understanding the genome
Researchers from Google DeepMind developed AlphaGenome, an AI model that predicts how DNA variants affect gene regulation across cell types. It processes sequences up to 1 million base pairs, outputs base-level predictions, and scores variant effects in seconds. Trained on ENCODE, GTEx, FANTOM5, and 4D Nucleome data, it outperformed existing models in 22/24 single-sequence tasks and 24/26 variant-effect benchmarks. It uniquely models splice junctions and enables comprehensive multimodal genomic analysis →read the paper
Gemini Robotics On-Device brings AI to local robotic devices
Researchers from Google DeepMind introduced Gemini Robotics On-Device, a compact VLA model for bi-arm robots capable of natural language instruction following and dexterous tasks like folding clothes or assembling belts. Running fully offline, it adapts to new tasks with 50–100 demonstrations and outperforms prior on-device models in generalization and instruction-following benchmarks. It was successfully fine-tuned to different robots like Franka FR3 and Apptronik’s Apollo, demonstrating strong embodiment generalization and fast adaptation capabilities →read the paper
Mercury, a General Chat Diffusion Large Language Model
Researchers from Inception Labs released Mercury, a diffusion-based LLM delivering 7x faster throughput than GPT-4.1 Nano, achieving 708 tokens/sec. Mercury matches or exceeds performance on key benchmarks: 85% on HumanEval, 83% on MATH-500, and 30% on AIME 2024. Despite running on standard NVIDIA GPUs, it achieves lower real-world voice latency than Llama 3.3 70B on Cerebras. Mercury powers Microsoft’s NLWeb for fast, grounded, hallucination-free natural language interactions →read the paper

Recommended dataset and benchmark

FineWeb2: One Pipeline to Scale Them All (by Hugging Face) builds a multilingual web-scale dataset pipeline adaptable to over 1,000 languages with language-specific filtering and rebalancing ->read the paper

The freshest research papers, categorized for your convenience

We organize research papers by goal-oriented or functional categories to make it easier to explore related developments and compare approaches. As always, papers we particularly recommend are marked with 🌟

Pretraining & Scaling Dynamics

Where to find Grokking tracks memorization-to-generalization transitions in LLM pretraining without test data → read the paper
🌟 OctoThinker improves reinforcement learning alignment via mid-training strategies and math-intensive corpora → read the paper
Skywork-SWE scales software engineering datasets and reveals unsaturated LLM performance on iterative coding tasks → read the paper

System Modeling & Real-World Prediction

🌟 Performance Prediction for Large Systems via Text-to-Text Regression (by Google)
models system behavior from logs and configs using text-to-text LLMs that outperform tabular regressors with few-shot adaptation → read the paper

Multimodal Generation & Editing

BlenderFusion (by Google) enables 3D-grounded scene editing by composing layered object and camera controls into a generative pipeline → read the paper
OmniGen2 supports text-to-image, editing, and in-context generation through dual decoders and reflection training → read the pape
🌟 Radial Attention (by MIT, NVIDIA, Princeton, UC Berkeley, Stanford, First Intelligence) accelerates long video generation by exploiting natural attention decay in diffusion models → read the pape
LLaVA-Scissor compresses video tokens based on semantic connectedness, improving efficiency in video-language models → read the pape
🌟 MADrive (by Yandex) enhances driving scene reconstruction using memory-based 3D asset swaps and photorealistic compositing → read the paper

Agentic Search & Reasoning Systems

MMSearch-R1 trains multimodal models to invoke image/text search on demand using RL for outcome-based reasoning → read the paper
🌟 Mind2Web 2 (by The Ohio State University, Amazon AGI) evaluates long-horizon web agent performance via real-world tasks and a rubric-driven AI judge → read the paper

Language & Code Generation Control

LongWriter-Zero boosts ultra-long text generation via RL with reward models for structure and coherence → read the paper
Steering Conceptual Bias guides scientific code generation toward target languages via latent-subspace activation → read the paper

Mixture of Experts & Model Architecture Innovations

🌟 Chain-of-Experts restructures MoE routing with sequential expert activation for better efficiency and expressiveness → read the paper

World Models & Autonomy

WorldVLA links autoregressive action and visual world modeling into a unified framework for predictive reasoning → read the paper
🌟 Ark simplifies robot learning via a Python-first framework that bridges simulation, policy learning, and real robots → read the paper

That’s all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

How was today's FOD?

Please give us some constructive feedback

Reply

or to participate.