Enterprise AI Middlemen: Who Survives the Agent Era?

Excerpt from today’s editorial: AI may eventually reduce the need for bulky enterprise software middlemen. But before that happens, Snowflake, Microsoft, Databricks, and others are fighting to become the trusted layer between raw AI capability and real company work.

🎫 From our partners: Explore AI & Cloud Innovation at AWS Summit NYC

Join AWS Summit NYC on June 17 to explore the latest in AI, cloud infrastructure, and modernization. At this free event, you’ll gain practical insights from leading voices like Dr. Swami Sivasubramanian, and gain access to over 200+ expert-led sessions through hands-on workshops and live demos.

→ This week was heavy with announcements. I was invited to two conferences at once – Snowflake Summit and Microsoft Build in San Francisco. And I went to both conferences because enterprise AI is where the industry is trying to make AI useful at scale. And the sentiment you read in a room full of analysts, while watching leaders push their agenda, defend their choices, and occasionally look uncomfortable, gives you much more than a machine-gun burst of news.

Let’s see what’s going on.

For two years we watched the AI story through models, benchmarks, chatbots, coding assistants, and consumer adoption. But the harder economic question is now moving inside companies. What happens when AI becomes easy enough for people to use directly? What if we don’t need a bulky and heavy middleman?

These were the questions I had in mind when I went to Snowflake Summit. Snowflake is a data-governance giant from the cloud-warehouse era, and now it has to prove it stays essential in the agentic one. The market liked the story – shares jumped more than 33% after earnings, FY2027 product-revenue guidance went up to $5.84B from $5.66B, the company signed a five-year $6B AWS deal, and market cap sits around $90B – but the strategic pressure is obvious. Databricks pushes from one side (some calls them “a bully in the room”), Microsoft tries to own the agentic interface from another, hyperscalers sit underneath everyone, and OpenAI and Anthropic, partners and customers on paper, are turning into the real competitors.

So I wanted the survival techniques. Does AI hurt enterprise software by making products replaceable, or does it make that software more valuable, because every agent still needs governed data, permissions, identity, memory, and trusted context? The usual debate stops at that binary, but I think that binary is the wrong place to stop.

Microsoft Build made the question larger. Satya Nadella’s message from the stage was that agents are everywhere – inside work, development, the operating system, local devices, cloud, enterprise knowledge. But the most useful thinking line came from a smaller Kevin Scott (Microsoft CTO) talk at a private event around Build:

capability is moving faster than deployment
activity doesn’t convert into value directly
models can do more than organizations can absorb
software can accelerate faster than companies can reorganize themselves
autonomy doesn’t equal trust

Yup, I thought, exactly that. But these are symptoms.

Here is what the lag actually creates. Enterprises are full of legacy systems, fragile pipelines, compliance constraints, security reviews, human habits, and workflows that only make sense because someone has been repairing them by hand for years. Agents get more capable by the month; organizations do not move at model speed. So we get a strange interval, where AI threatens intermediaries in the long run while enterprise complexity keeps them useful in the short one. Snowflake, Microsoft, Databricks, Salesforce, ServiceNow are all fighting inside it, and from what I heard in the hallways, there is real uncertainty in there, almost panic, and a lot of running in too many directions at once. I also feel hey might be brittle because of their own internal complexity. Agents might need different organisation.

The comforting reading of that interval – agents threaten us later, complexity protects us now – treats the moat as a function of time. I think it's a function of cost. As per-token cost falls, the raw work middleware used to charge for, querying and generating and connecting and transforming, drifts toward free. What stays expensive is trust: knowing an agent acted on the right data, with the right permissions, on behalf of the right person, with a record of why. When everything cheap gets cheaper, the scarce thing is governed proximity to intent.

Which means the middleman doesn't disappear in the direct-use world; it changes shape. It stops being an application – capability packaged into rigid software and sold back through layers of interface – and becomes a substrate, the trust and permission layer that sits at the moment a stated intention turns into a permitted action. The agent era doesn't need less governance. It needs more, located somewhere new.

But it’s a different type of governance.

This is the test Snowflake should be judged by. Not whether it governs data – it does, and that was the moat of the last era – but how it governs it, and whether it moves up to the intent layer or stays a managed warehouse with AI bolted on top; whether it reduces the distance between what a person wants and useful work getting done, or adds one more governed layer between the user and the outcome.

And tokenmaxxing might be even dangerous here. Independently from each other, Sridhar Ramaswamy, Snowflake's gentle CEO, and Kevin Scott were making the same point: activity, or tokenmaxxing, is not value, and more agents, more context, more tokens, more integrations, more automation can all produce motion without progress. I think this is the dilemma we will be sorting through this year, both at the behemoth and startup levels.

The middleman doesn't die in this story. It moves – closer to where intention becomes work, and further from the warehouse.

If any of those thoughts resonate with you – share them across your social networks. Let’s keep the conversation going.

Topic 2: Also, NVIDIA made tons of announcements in Taipei (see News from the Usual Suspects below). We played with and covered one of the most interesting: Cosmos 3, its omnimodal world model family.

Twitter Library

20 Advanced RAG Types to Know in 2026

20 cutting-edge RAG approaches in 2026: Agentic RAG, MiA-RAG, HGMem, Graph-O1, Bidirectional RAG, multimodal, multilingual, structured and security RAG systems.

Turing Post • Alyona Vert.

News from the usual suspects ™

Snowflake raised guidance, signed a $6B AWS agreement, expanded its partnership with Anthropic, and introduced renamed CoCo and Snowflake Intelligence / CoWork. Together, they show Snowflake trying to move from storing data to helping people and agents act on it. My question: do companies need this new layer, or is it another unnecessary AI coding interface.

Microsoft used Build to unveil Project Solara (very interesting but still mostly under development), Scout (basically your chief of stuff), and the Surface RTX Spark Dev Box, signaling a future where agents run directly on Windows devices rather than exclusively in the cloud. They also introduced an large family of MAI models, including the first reasoning one.

Anthropic filed confidential IPO paperwork, raised $65B at a $965B valuation, and expanded its security-focused Project Glasswing, which has already helped find more than 10,000 critical vulnerabilities.
OpenAI claimed one of its models independently disproved a longstanding geometry conjecture, rolled out more enterprise agent deployments through Codex, and published a new Frontier Governance Framework as AI regulation starts taking shape.
NVIDIA dominated Computex with a vision that goes far beyond GPUs:
- RTX Spark brings agentic AI to Windows PCs with up to 1 PFLOP of AI performance and 128GB unified memory.
- DGX Station puts a trillion-parameter AI supercomputer on an engineer's desk.
- New humanoid robot blueprint, robotaxi, AI factory, and digital twin announcements reinforced Jensen Huang's central thesis: AI needs infrastructure, not just models.
Meta scaled back plans to collect employee mouse and keyboard activity for AI training after internal backlash, highlighting a growing tension between agent development and workplace privacy.
The U.S. government announced plans to invite frontier AI labs to voluntarily submit advanced models for cybersecurity testing before public release.
Apple remains the biggest question mark ahead of WWDC next week, where expectations are building around the next phase of Apple Intelligence and Siri.

Research highlight

— # (#)

Models

xAI Grok Build 0.1 (released May 20–29, 2026): Fastest coding-focused model from xAI, optimized for agentic workflows, multi-file edits, tool use, and terminal-native development. 256K context, supports text + image inputs. Now public beta on xAI API.
Anthropic Claude Opus 4.8 (late May 2026 rollout): Major upgrade to the Opus line focused on coding accuracy, long-running task reliability, objective progress reporting, and reduced defect rates in complex reasoning/workflows. Same pricing tier as 4.7.
MiniMax M3 (announced May 31, 2026): First open-weights model claiming frontier-level coding & agentic performance (59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas). Features MiniMax Sparse Attention for up to 1M token context, native multimodality from day one (text + vision). API live now with 50% off promo for first 7 days (≤512K context). Full weights + tech report expected in ~10 days.
Microsoft MAI family (announced on June 2):
- MAI-Thinking-1: Microsoft’s first reasoning model. 35B active parameters, 128K context. Strong on complex multi-step instructions, long-context reasoning, and code generation. Matches Opus 4.6 on SWE-Bench Pro; preferred over Sonnet 4.61 in blind human tests. Low token cost, high efficiency. Private preview on Foundry.
- MAI-Image-2.5 (and flash variant): Microsoft’s first native text-to-image and image-to-image models. Surpasses Nano Banana Pro on ELO. Rolling out in PowerPoint, OneDrive, and Foundry.
- MAI-Transcribe-1.5: SOTA accuracy across 43 languages (streaming soon).
- MAI-Voice-2 (and flash variant): Expanded to 15+ additional languages with new voice options.
- MAI-Code-1: Ultra-efficient inference coding model, tuned for GitHub; now in Copilot and VS Code.
NVIDIA
- Cosmos 3 – a unique omnimodal world model – that we covered here
- Nemotron 3 Ultra (available starting June 4, 2026): NVIDIA’s largest and best open model to date, specifically designed for long-running, complex autonomous agent workloads (deep reasoning, code generation, extensive research). Claims 5× faster inference and up to 30% lower cost for agentic tasks vs. other open models in class.
- Alpamayo 2 Super: Open 32B-parameter reasoning VLA (vision-language-action) model focused on the full driving stack for safer Level 4 autonomous development. Emphasizes reasoning, planning, and acting.
- Gamma-World (γ-World) (announced May 27, 2026): It scales interactive simulation beyond single- or two-player settings. Introduces Simplex Rotary Agent Encoding and Sparse Hub Attention for independently controllable, permutation-symmetric agents. Enables real-time (24 FPS) coherent multi-agent video rollouts with strong zero-shot generalization to more agents.
Qwen3.7-Plus (announced June 1, 2026 by Alibaba): Multimodal agent foundation model that unifies vision + language into a single versatile agent.

Research

Trends we see looking at every paper related to AI and ML published last week:

AI research agents and deep research

AutoResearchClaw – Moves autonomous research from a linear pipeline toward iterative systems with debate, self-healing execution, human intervention, and cross-run learning →read the paper
QUEST – Trains open deep-research agents with fully synthetic tasks, making synthetic task generation look like a serious path for scaling research agents →read the paper
🌟 ScientistOne – Pushes autonomous research toward verifiability by requiring claims to trace back to evidence, which directly addresses hallucinated citations, unreproducible scores, and method-code mismatch →read the paper

Agent operating systems: harnesses, skills, and coordination

🌟 SkillOpt – Treats agent skills as trainable external state, with controlled edits, validation gates, and transfer across models and execution harnesses →read the paper
🌟 MUSE-Autoskill – Builds a full lifecycle for agent skills: creation, memory, management, evaluation, refinement, reuse, and cross-agent transfer →read the paper
Foundation Protocol – Proposes coordination infrastructure for large populations of agents, with identity, provenance, multi-party organization, metering, receipts, settlement, and governance as first-class concerns →read the paper

Memory, context, retrieval, and persistence

ACC – Converts agent trajectories into long-context training data, using tool responses and environment observations as supervision for distant evidence integration →read the paper
WorldKV – Makes persistent world memory cheaper for interactive video world models by retrieving and compressing KV-cache chunks instead of keeping everything in full attention →read the paper
🌟 Do Language Models Need Sleep? – Introduces offline recurrence as a consolidation phase, giving models a way to process long-horizon memory outside the active inference window →read the paper
🌟 OmniRetrieval – Unifies retrieval across text, relational tables, knowledge graphs, and property graphs without flattening every source into one generic representation →read the paper
Personalize-then-Store – Moves agent memory toward user-specific storage policies, asking which interactions are worth saving for each person rather than applying one universal rule →read the paper

Training for reasoning, diversity, and test-time search

DelTA – Improves RLVR by assigning token-level credit more discriminatively, so training can amplify the tokens that actually separate successful from failed reasoning →read the paper
🌟 Vector Policy Optimization – Trains models to produce diverse solution sets for test-time search, instead of collapsing toward similar answers optimized for one scalar reward →read the paper

Architectures, world models, and physical AI

HRM-Text – Tests whether hierarchical recurrent architectures can deliver stronger sample efficiency than standard Transformer scaling recipes →read the paper
🌟Gamma-World – Extends world models from single-agent or two-player settings toward scalable multi-agent interactive simulation →read the paper
Qwen-VLA – Unifies manipulation, navigation, and trajectory prediction inside one vision-language-action model across tasks, environments, and robot embodiments →read the paper

Verifiable environments for computer-use agents

OpenComputer – Creates verifiable software worlds for desktop agents, using app-specific state verifiers, auditable trajectories, and partial-credit rewards →read the paper
MobileGym – Provides a highly parallel, verifiable mobile-GUI simulation platform, making smartphone-agent training more scalable and measurable →read the paper
CUA-Gym – Scales verifiable RL environments for computer-use agents by generating task instructions, environment states, and reward functions together →read the paper

That’s all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

How did you like it?

FAQ

What is the “middleman interval” in enterprise AI?

The middleman interval is the current phase in which AI is powerful enough to threaten some software intermediaries, but enterprise complexity still makes trusted platforms valuable. Companies need governance, permissions, security, identity, and reliable data access before agents can safely perform real work.

Will AI hurt SaaS companies?

AI may pressure SaaS companies that mainly package narrow workflows behind rigid interfaces. But it can also strengthen companies that become trusted layers for agentic work: managing data, identity, security, governance, workflow execution, and observability.

What is tokenmaxxing?

Tokenmaxxing is the habit of using more AI, more context, more agents, and more tokens without proving that the extra activity creates proportional value. In enterprise AI, the backlash against tokenmaxxing is really a demand for measurable useful work.

⬅️ FOD 153 Agentic coding in search – What it even means?

➡️ FOD 155 Continual Learning in LLMs: Why AI Models Need Sleep

FOD#154: Enterprise AI Middlemen: Who Survives the Agent Era?