FOD#126: What is Kosmos AI?

and why Spatial Intelligence matters

This Week in Turing Post:

  • Wednesday / AI 101 series: Kimi K2 Thinking and why all the hype

  • Friday / Interview: Great conversation with Anneka Gupta, CPO at Rubrik

Multi-agent systems can handle more complex tasks, but are they worth the orchestration overhead, and how can they be made reliable in production? 

Read Galileo’s new 165-page guide for a comprehensive exploration of multi-agent systems – a true treasure trove for learning.

Our news digest is always free. Click on the partner’s link to support us or Upgrade to receive our deep dives in full, directly into your inbox →

Editorial: Today I have two things to share with you in my editorial.

First, Kosmos AI. This week, many people in research circles are talking about Kosmos AI – a new “AI scientist” built by Edison Scientific in collaboration with many great research institutions and supported by Eric Schmidt (our reader and supporter as well).

Edison Scientific, a spinout from FutureHouse, designed Kosmos as a system for autonomous discovery. In its first phase, Kosmos produced seven findings across neuroscience, materials science, and clinical genetics. Three reproduced unpublished human work; four are new contributions now being validated with academic partners. About 79 percent of its results replicate – roughly the same rate as early-stage human research. Each discovery can be traced back to the exact code and papers that informed it, giving a rare level of auditability to AI-generated science. That alone is remarkable.

But what I found most interesting is that anyone can use it, not just scientists. And here’s an example that probably resonates with everyone – medical advice. It’s a bit personal, but I’m doing a long fast right now (three days coffee + water, then eight days water only). It’s not my first time, and I often use ChatGPT to guide me, but its answers can change a lot from one day to another. That’s fine when you know the subject and can read between the lines, but when it comes to your health, you need something you can actually trust – something that shows what the entire research field says, with a list of sources you can verify and explore if needed.

The model runs on a credit system: free users get 10 credits, and each conversation with a specialized agent costs one. Follow-ups count too.

The only one that requires a full subscription – about $200 per month – is Kosmos itself, used for serious work like Mechanism of entorhinal cortex vulnerability in aging or Nucleotide metabolism as the dominant pathway altered under hypothermic conditions in brain. And this agent can really do science.

I feel a little bit sorry for those scientists who feverishly deny AI and say they’ll never use it because it’s “always wrong,” “it steals their job,” or whatever other reason. Tools like this accelerate a human scientist’s work to an unbelievable degree. Go, grab it, and make new breakthroughs with it. Seriously, refusing AI is like refusing a microscope and saying you’re fine looking at molecules through your grandma’s magnifying glass →read their Kosmos tech report here and play with it here. 

Second, Fei Fei Li and her new blog. She started it 10 hours ago, and I believe that will be one of the most interesting reads about Spatial Intelligence. We covered it once here “What is Spatial Intelligence” (free to read).

She writes, that Spatial intelligence depends on world models built on three core principles – they must be generative (able to create coherent, physics-consistent simulated worlds), multimodal (able to understand and respond through any combination of inputs like images, text, or actions), and interactive (able to predict how the world changes in response to actions or goals). Together, these define the foundation for truly spatially intelligent AI. She also says: “The scope of this challenge exceeds anything AI has faced before.”

I’m very happy that thinkers and builders like Fei-Fei Li are sharing their insights with the public, helping more people understand where the next frontier of AI is truly heading →read her blog here

Note: There are a few new papers on Spatial Intelligence in the Research Papers section today. Check them out.

Attention Span: Sam Altman published a long clarification about OpenAI’s rumored government backing – denying bailout plans while outlining $1.4 trillion in infrastructure spending through 2033. That’s a big number! Considering that they are are going to make only $30 billion this year. What does he actually mean? Let’s discuss. Watch it here

Curated Collections – Let’s talk a bit more about presicion

Follow us on  🎥 YouTube Twitter  Hugging Face 🤗

What are we reading/watching:

News from The Usual Suspects ©

  • Deepnote with a new notebook

    After seven years in stealthy development, Deepnote has gone open-source. The team’s new .deepnote format replaces the aging .ipynb with something future-proof: human-readable YAML, AI-native design, multi-language support, and a project-based structure fit for real-world teams and AI agents alike. As they say: “It’s a big step toward making data tools fully open and community-driven.”

  • Memories.ai brings memory to your device
    Memories.ai has unveiled LVMM 2.0, a model designed to give machines something humans take for granted: persistent visual memory. In partnership with Qualcomm, this next-gen tech will run on-device across phones, cameras, and wearables by 2026 – bringing sub-second video search, privacy-preserving inference, and real-time visual recall to the edge. Goodbye scrubbing through footage; hello semantic memory for machines. We are going to interview their CEO in December – send us your questions.

  • Webflow uncovers the tug-of-war behind the homepage
    Webflow’s State of the Website 2026 reveals a digital battleground: marketing and engineering are at odds over strategy, governance, and control. 92% see cross-functional friction, 97% feel the weight of technical debt, and developers are increasingly frustrated – some to the point of quitting. Meanwhile, AI is knocking, but half the teams aren’t sure it’s safe to let it in.

  • Google shoots for the stars – literally
    Project Suncatcher is Google Research’s latest moonshot: solar-powered satellite constellations armed with TPUs, linked by high-speed optical comms, and designed to scale AI compute in space. Think cloud infrastructure – just in orbit. With bench-tested bandwidths of 1.6 Tbps and radiation-tolerant TPUs, it's a wild yet plausible vision of off-planet machine learning. If this flies, “cloud computing” may soon be a literal term.

  • Google Cloud sharpens its silicon for the AI era
    Google Cloud has officially launched Ironwood TPUs – its most powerful and efficient custom chips yet – delivering 10x the peak performance of v5p and redefining inference at scale. Alongside, new Arm-based Axion VMs promise serious price-performance gains for general-purpose workloads. Welcome to the AI Hypercomputer age: optimized from chip to cluster, and built to scale like never before.

  • OpenAI warns: we’re not ready
    In its latest dispatch, OpenAI reflects on how AI has quietly passed historic milestones – outthinking humans in elite domains – while most still see it as a fancy chatbot. their concern is that tech is racing ahead of public understanding and governance. OpenAI urges new safety norms, government coordination, and a full-blown AI resilience ecosystem before superintelligence arrives. They are peculiar people, this OpenAI.

This emoji 🦋 means open-source.

Interesting Datasets for Robotics:

  • PHUMA: Physically-Grounded Humanoid Locomotion Dataset – curate large-scale video-based humanoid motion data with physics-constrained retargeting that enforces joint limits and contact fidelity, producing physically reliable motions for robust imitation learning →read the paper

  • TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System – develop a mocap-free, VR-based teleoperation system for fast, low-cost humanoid data collection and whole-body visuomotor control, enabling dexterous manipulation and dynamic locomotion →read the paper

  • VSI-590K: Spatially-Focused Instruction-Tuning Dataset – build a large-scale dataset centered on spatial reasoning, aggregating diverse sources with fine-grained spatial annotations to improve models’ spatial understanding →read the paper

Models to pay attention to:

  • 🌟🌟 🦋 Kimi K2 Thinking – the model that blew everyone’s mind. We will cover it on Wednesday including the researchers opinions about it. In short, it is an open-source long-horizon reasoning agent supporting hundreds of sequential tool calls, optimized for INT4 inference, achieving frontier performance on reasoning, coding, and web-agent benchmarks through deep tool integration →read the paper

  • 🌟🦋 NVIDIA Nemotron Nano V2 VL – advance document and video understanding with a hybrid Mamba-Transformer vision-language architecture using token reduction for efficient long-context reasoning; released in multiple precision formats with open datasets and recipes →read the paper

  • iFlyBot-VLA – integrate language, vision, and action through dual-level supervision that aligns high-level latent actions with low-level control tokens, producing a unified VLA model capable of precise 3D reasoning and real-world manipulation →read the paper

The freshest research papers, categorized for your convenience

We organize research papers by goal-oriented or functional categories to make it easier to explore related developments and compare approaches. As always, papers we particularly recommend are marked with 🌟.

Highlight:

  • 🌟🌟 Cambrian-S: Towards spatial supersensing in video
    Researchers from New York University and Stanford University introduce Cambrian-S, a family of spatially grounded multimodal models trained on a 590K-sample video dataset (VSI-590K) for spatial reasoning. They propose VSI-SUPER, a benchmark with two tasks: spatial recall (VSR) and spatial counting (VSC), using up to 240-minute videos. Cambrian-S achieves 30%+ gains on VSI-Bench but fails on VSI-SUPER, revealing scale limitations. A predictive sensing prototype using latent frame prediction and surprise-based memory outperforms Gemini-2.5-Flash on VSI-SUPER tasks, showing prediction aids long-horizon spatial understanding →read the paper

Continual learning paradigms

  • 🌟 Nested learning: A new ML paradigm for continual learning (by Google) – frame models as nested optimizers with self-modifying “Hope” architecture and continuum memory to sustain long-context reasoning and resist forgetting →read the paper

Agent training, simulation & experience synthesis

  • 🌟 Scaling Agent Learning via Experience Synthesis (by Meta) – distill environment dynamics into a reasoning-based experience model to generate scalable synthetic rollouts, warm-start RL, and match PPO/GRPO with far fewer real interactions →read the paper

  • 🌟🦋 Magentic Marketplace: An open-source simulation environment for studying agentic markets (by Microsoft) – simulate two-sided markets of assistant and service agents to evaluate welfare, bias, prompt-injection risks, and search design under realistic competition →read the paper

  • Simulating Environments with Reasoning Models for Agent Training – synthesize SFT trajectories (Simia-SFT) and RL feedback (Simia-RL) with LLM-simulated environments to train agents without real APIs, surpassing strong baselines on τ²-Bench →read the paper

Spatial cognition, multimodal reasoning & grounding

  • 🌟 Visual Spatial Tuning – construct VST-P (4.1M) and VST-R (135k) and train VLMs with SFT→RL to enhance spatial perception→reasoning without hurting general abilities →read the paper

  • 🌟 Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings (by University of Maryland, Dolby Laboratories, Hilabs, Capital One) – integrate average-pooled visual features into textual embeddings to rebalance modalities, improving grounding and reducing hallucinations →read the paper

  • Actial: Activate Spatial Reasoning Ability of Multimodal LLMs – build Viewpoint-100K and train via SFT + GRPO to enforce cross-view consistency, improving 3D/spatial reasoning in- and out-of-domain →read the paper

Tabular ICL & retrieval systems

  • 🌟 Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning (by Lexsi Labs) – combine multi-scale processing, block-sparse attention, and Perceiver-style memory to capture hierarchical feature interactions and scale to wide tables →read the paper

  • 🦋 Trove: A Flexible Toolkit for Dense Retrieval – provide low-code, on-the-fly dataset filtering/combination, unified evaluation and hard-negative mining, and multi-node scaling for customizable retrieval research →read the paper

Efficient reasoning training & decoding behavior

  • 🦋 Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR – retain/up-weight moderately easy items in RLVR to regularize length, yielding equally accurate but much shorter solutions without explicit penalties →read the paper

Theory & foundations

  • 🌟 Diffusion Language Models are Super Data Learners (by National University of Singapore) – show DLMs outperform AR models under data scarcity via any-order modeling, denoising compute, and MC augmentation, achieving strong accuracy with repeated data →read the paper

  • The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms – prove existence conditions for strong lottery tickets in MHA and extend theory to transformers without normalization, with exponential error decay empirics →read the paper

AI Scientists

  • Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper – orchestrate an agentic research workflow (analyze→hypothesize→implement→experiment→write) and audit capabilities/risks via AI reviewers and venue evaluations →read the paper

That’s all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

How did you like it?

Login or Subscribe to participate in polls.

Reply

or to participate.