• Turing Post
  • Posts
  • FOD#53a: Google’s moat against OpenAI + See you at Microsoft Build

FOD#53a: Google’s moat against OpenAI + See you at Microsoft Build

Google I/O, OpenAI drama, Microsoft Build + the best curated list of research papers and other reads

Wow. This week is going to be hot! I’m in Seattle right now to cover Microsoft Build and bring you insights from Kevin Scott, Microsoft's CTO.

Since it's all very exciting and reportage is a completely different thing than just a pure analysis based on 150+ newsletters and media well-read, we are changing our usual schedule. This week, you will receive two FODs:

  • Today, on Monday, we will cover the news from Google I/O and their moat against OpenAI.

  • Tomorrow, on Tuesday, fresh and hot, right after our conversation with Kevin Scott, we will send you what caught our attention from Microsoft's announcements (they already announced Surface Pro 11, Surface Laptop 7, and Copilot+ PCs powered by Snapdragon X Elite – but there is more to come!)

Are you also in Seattle? Let me know, maybe we can catch up for a coffee.

If you like Turing Post, consider becoming a paid subscriber. You’ll immediately get full access to all our articles, investigations, and tech series →

Google’s moat against OpenAI 

Last week saw two big tech events: the OpenAI Spring updates and Google I/O. We covered OpenAI’s impressive presentation of GPT-4o and thought OpenAI would “rest on their laurels,” but by the end of the week, a few notable resignations occurred. Ilya Sutskever, Jan Leike, and Evan Morikawa left the company. When many scientific personnel depart, it often indicates a shift in favor of product-oriented priorities, which is concerning, considering OpenAI’s goal to achieve not-fully specified AGI. It’s sad that while delivering so much, they are also notable for frequent drama and reactive damage control.

This brings us to Google. After their updates last week during Google I/O, some observers noted that Google, which hasn't partnered with any foundation model builders (such as OpenAI, Anthropic, Mistral, etc.), is catching up quickly. Considering the turmoil at OpenAI, it’s safe to say that Google’s moat – initially perceived as a disadvantage – lies in their size and history. Google is a large, established tech company with diversified revenue streams, demonstrating financial stability and consistent growth. It’s basically drama-free. Sundar Pichai is well-paced and plays a very long-term game. He and Google might seem to move slower at first, but they have tremendous ML talent, developed infrastructure, and business applications for their AI. Google's steady, methodical approach could prove to be more resilient in the long run.

But, there are different opinions as well. Stratechery argues that weaknesses emerge in Google's innovation pipeline outside its core competencies. The disappointment highlighted during the Google I/O keynote, for instance, stems from what appears to be a series of underdeveloped new products that do not yet match the transformative impact of its existing technologies. Additionally, many of Google's ambitious projects, such as AI Agents and Project Astra, are still at a conceptual stage without immediate practical applications, leading to perceptions of them as vaporware. These initiatives show potential but also reveal a gap between Google's visionary presentations and their current practical implementations. This gap may affect Google's ability to maintain its innovative edge against rapidly evolving competitors in the AI space.

Google I/O 2024 was, of course, a showcase of the company's deepening commitment to AI. Here are highlights:

Gemini Enhancements & Integrations:

  • Google's Gemini model was taking center stage. An incredible upgrade is the doubling of Gemini 1.5 Pro's context window from 1 million to 2 million tokens, enhancing its ability to understand and respond to complex queries.

  • Google's latest language model is not only getting faster and more capable but is also being integrated across various Google products (such as Gmail, Drive, Docs, etc). 

Generative AI Innovations:

  • Beyond Gemini, Google introduced PaliGemma, a powerful open vision-language model inspired by PaLI-3. PaliGemma combines the SigLIP vision model and Gemma language model for class-leading performance in tasks like image captioning, visual question answering, and object detection.

  • And unveiled Gemma 2, a next-generation AI model with 27 billion parameters, offering class-leading performance at half the size of comparable models like Llama 3 70B.

  • And set a waitlist for Imagen-3, their highest quality text-to-image model.

  • Google also presented Project Astra, an ambitious endeavor to create a multimodal AI assistant that “can process multimodal information, understand the context you're in, and respond naturally in conversation.”

  • Another notable reveal was Veo, a GenAI model capable of producing 1080p videos from text, image, or video prompts, opening new creative possibilities. ElevenLabs immediately gave it a try: 

  • Firebase Genkit was introduced to help developers build AI-powered applications more efficiently.

Search & Information Access Improvements

  • Very cool feature: “Ask Photos”, powered by Gemini, enables users to query their photo libraries conversationally.

  • Google Chrome is also getting smarter with the integration of Gemini Nano, facilitating text generation within the browser.

  • Google Search is receiving an AI overhaul with "AI Overviews," summarizing information from the web, and a new "Circle to Search" feature for solving math problems.

  • Finally, Google's SynthID, an AI watermarking tool, is being upgraded to detect AI-generated videos and images.

  • Google Lens received a significant upgrade, allowing users to search using video recordings. 

Hardware

  • Everybody tries to announce something about compute. At Google I/O 2024, Google unveiled Trillium, its sixth-generation TPU, offering a 4.7x increase in compute performance per chip, double the HBM and ICI bandwidth, and 67% greater energy efficiency. Featuring third-generation SparseCore, Trillium supports large-scale AI models like Gemini 1.5 Flash and Imagen 3. These TPUs can scale to hundreds of pods, forming supercomputers, and enhance AI workloads, supporting frameworks like JAX and PyTorch/XLA.

Overall, Google I/O 2024 underscored the company's focus on making AI more accessible, powerful, and integrated into everyday tools and experiences. The event set the stage for a future where AI plays an even more significant role in how we interact with technology and information.

Google I/O Keynote:

Twitter Library

News from The Usual Suspects ©

Microsoft’s turn to shine

  • Microsoft Build kicks off tomorrow, May 21-23, but they has already announced Surface Pro 11, Surface Laptop 7, and Copilot+ PCs powered by Snapdragon X Elite. These processors are expected to boost AI performance, potentially surpassing the M3 MacBook Air. New Copilot features include Recall search for comprehensive file history retrieval, Cocreator for image generation, and AI-generated in-game hints for Xbox Game Pass.

  • Stay tuned for tomorrow!

Hugging Face’s cool launches

  • HF introduced ZeroGPU – a significant step forward in democratizing AI technology by providing shared GPU infrastructure. Independent and academic AI developers often lack the resources available to big tech companies. ZeroGPU allows users to run AI demos efficiently on shared GPUs without bearing high compute costs, offering $10M of free GPUs to support this initiative. The infrastructure uses Nvidia A100 GPU devices and operates more energy-efficiently by dynamically allocating GPU resources, which can host multiple spaces simultaneously.

  • HF launched Transformers Agents 2.0, an updated framework for creating agents that solve complex tasks by iterating based on past observations.

Some good news from OpenAI

In other newsletters:

BTW, you might also like this episode about ImageNet. Read it here, It’s free

The freshest research papers, categorized for your convenience

AI Model Innovations and Performance Enhancements

  • DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model: Develops a high-performance, cost-effective Mixture-of-Experts model that showcases significant enhancements in training cost reduction and computational efficiency →read the paper

  • Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models: Examines Tucker decomposition to optimize the balance between model size reduction and performance retention in language models →read the paper

  • SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts: Addresses the memory wall challenge in AI accelerators with a novel architecture that leverages a Composition of Experts for improved performance →read the paper

  • Many-Shot In-Context Learning in Multimodal Foundation Models: Explores the enhancement of in-context learning capabilities in multimodal foundation models using many-shot learning, demonstrating improvements in performance and efficiency across diverse datasets →read the paper

  • MambaOut: Do We Really Need Mamba for Vision?: Evaluates the necessity of Mamba's state space model for vision tasks, demonstrating that while it may not be essential for image classification, it holds potential benefits for more complex tasks like object detection and segmentation →read the paper

Security and Ethical Considerations in AI

  • Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent: Investigates security weaknesses in LLMs, presenting a method that successfully evades detection mechanisms →read the paper

Benchmarks and Evaluations in AI Research

  • Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots: Creates a benchmark for assessing LLMs' ability to interpret and code from scientific plot visuals →read the paper

  • MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels: Introduces a substantial web dataset to support advancements in AI and large-scale information retrieval →read the paper

Strategic Frameworks and Theoretical Advances in AI

  • RLHF Workflow: From Reward Modeling to Online RLHF: Presents a comprehensive strategy for implementing online reinforcement learning from human feedback, documenting performance gains over offline methods →read the paper

  • Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory: Develops a theoretical approach using associative memory to analyze and explain transformer model behaviors →read the paper

  • Position: Leverage Foundational Models for Black-Box Optimization: Advocates integrating LLMs with black-box optimization processes, proposing new ways to leverage AI for complex decision-making →read the paper

Research Surveys and Comparative Studies

  • A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models: Surveys developments in retrieval-augmented generation techniques, analyzing their impact on enhancing LLM performance and mitigating limitations →read the paper

  • Understanding the performance gap between online and offline alignment algorithms: Delves into the disparities between online and offline alignment methods in reinforcement learning, elucidating their strengths and weaknesses →read the paper

If you decide to becoming a Premium subscriber, you can expense this subscription through your company. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

Thank you for reading! We appreciate you. 🤍 

How was today's FOD?

Please give us some constructive feedback

Login or Subscribe to participate in polls.

Join the conversation

or to participate.