FOD#49: Llama 3 + Electric Atlas

two releases that potentially can blur the lines between software and hardware

Next Week in Turing Post:

  • Wednesday, Computer Vision Series: The Expansion of Theory and Practice: 1980s

  • Friday: profile of an AI Unicorn

If you like Turing Post, consider becoming a paid subscriber. You’ll immediately get full access to all our articles, investigations, and tech series →

Last week, two remarkable events occurred: Meta announced its Llama 3, the most capable and highly anticipated open large language model (LLM) to date; and Boston Dynamics introduced its new fully electric Atlas robot platform—a stark departure from their hydraulic robots! At first glance, these developments may seem unrelated, but they are actually deeply interconnected, with the potential to synergistically drive each other forward and reshape how we work with and implement AI.

At the heart of this connection is the transformative power of advanced AI. Breakthroughs in natural language processing and machine learning, exemplified by Llama 3, extend beyond language alone. These techniques, from deep neural networks to reinforcement learning, are also propelling significant advancements in computer vision, motion planning, and robot control. As language models like Llama 3 continue to expand the boundaries of AI's ability to understand and interact with the world, they also lay the groundwork for more intelligent and capable robots.

The implications, of course, extend beyond communication. The same techniques used to train LLMs on vast amounts of text data are also applicable to learning from massive datasets of sensor readings, images, and simulations for robots. Moreover, the open-source nature of models like Llama 3 democratizes access to state-of-the-art AI, enabling a broader spectrum of researchers and companies to integrate these capabilities into their robotic systems. At least start playing on that field!

Every successful (and unsuccessful) attempt by Atlas to manipulate an object, navigate a cluttered factory floor, or assist a human worker becomes a valuable data point for the model, just as it does for Llama 3. Using this innovative model, Meta developed a standalone app, meta.ai, and embedded it into WhatsApp, Instagram, and Facebook, collecting vast and highly personalized data. This could lead us toward synthetic social networks or – which is more likely – an extremely personalized, embodied AI experience. As Llama 3 and the electric Atlas converge, they may accelerate each other's development, blurring the lines between software and hardware, bits and atoms. As language models become more proficient at understanding and interacting with the world, robots will become more capable of applying that knowledge.

This doesn't necessarily mean evil. More capable and advanced robots, which can be fine-tuned and improved with your personal data and specifics, could take over mundane tasks at home. Imagine, instead of hundreds of text-editors, finally having a robot that can load and unload the dishwasher and do the laundry.

Currently, launches like Llama 3 are seen as enhancing AI's understanding and processing capabilities, but in the long term, they will be one of the milestones in building and deploying machines that are finely attuned and aligned with us to assist in our daily lives.

Twitter Library

Other Impressive Models:

(I didn’t send you the FOD digest last Monday because I was at a conference dedicated to citizen diplomacy. Therefore, today we have an extensive list of recently launched models that are worth checking out, along with other relevant research papers.)

  • Mixtral-8x22B: Mistral AI introduced a scalable sparse mixture of experts model optimizing cost and latency by selectively using parameters during inference, offering a high capacity and efficiency model for further training and applications →read the paper

  • Rerank 3: Launched by Cohere, this model enhances enterprise search and Retrieval Augmented Generation systems, improving accuracy in document retrieval across multiple languages and data formats →read the paper

  • Idefics2: Hugging Face's model that excels in integrating text and image data, significantly improving on OCR and visual question answering tasks →read the paper

  • Reka Core, Flash, and Edge: A series of multimodal language models from Reka that process text, images, video, and audio, demonstrating high performance across diverse tasks →read the paper

  • Ferret-UI: Developed by Apple, this model specializes in mobile UI interaction, enhancing user experience by accurately performing tasks tailored to the unique properties of UI screens →ead the paper

  • Zamba: Zyphra's compact and efficient SSM Hybrid model designed for performance with reduced training data needs, optimized for consumer hardware →read the paper

  • YaART: Yandex's advanced text-to-image diffusion model that optimizes training efficiency and quality with smaller, high-quality datasets →read the paper

  • RHO-1: A novel approach by Xiamen University focusing on Selective Language Modeling to enhance efficiency by prioritizing useful tokens during training →read the paper

  • RecurrentGemma: Google DeepMind's model that moves past traditional transformers by incorporating recurrences for more efficient long-sequence processing →read the paper

  • JetMoE: An economical LLM from the MIT-IBM Watson AI Lab using a mixture-of-experts architecture to achieve high performance at reduced costs →read the paper

News from The Usual Suspects ©

Google

  • open-sourced Gemini Cookbook 

  • and is merging its Research, DeepMind, and Responsible AI teams to accelerate its “capacity to deliver capable AI

  • DeepMind published an article about the ethics of advanced AI assistants, stressing the significance of their integration into daily life

Hugging Face

Stanford

  • published the 2024 AI Index report, reflecting the escalating impact of AI on society. It contains data with new estimates on AI training costs and detailed insights into the responsible AI landscape. It also introduces a new chapter on AI's influence on science and medicine. Highlights include the substantial cost of training state-of-the-art models, such as GPT-4 and Gemini Ultra; the dominance of the U.S. in producing top AI models; and significant investment growth in generative AI despite overall funding declines. Additionally, the report notes a major increase in AI-related regulations in the U.S. and heightened public awareness and nervousness about AI's future impact.

The freshest research papers, categorized for your convenience

Our top-3

MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context LengthDeveloped by Meta AI and university researchers, MEGALODON improves upon the MEGA model to efficiently manage sequences up to 32,000 tokens. Integrating innovations like Complex Exponential Moving Average (CEMA) and timestep normalization, it outperforms traditional Transformer architectures in efficiency and pretraining stability across various tasks and modalities.

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention  Researchers from Google developed a method for Transformer-based LLMs to handle indefinitely long inputs using a new attention mechanism, Infini-attention, which integrates compressive memory within standard attention structures. This allows efficient handling of both short and long-term dependencies without significant increases in memory demands. The model shows substantial improvements over existing methods in handling extensive datasets and long-context tasks, enabling scalable, effective long-sequence processing.

Best Practices and Lessons Learned on Synthetic Data for Language Models – Researchers from Google DeepMind, Stanford University, and Georgia Institute of Technology explored the benefits and challenges of using synthetic data for language models. They underscored synthetic data as a crucial solution to data scarcity, privacy concerns, and high costs, enabling the creation of robust, unbiased, and factually accurate models. The paper highlighted the importance of responsible synthetic data usage to ensure models are inclusive and trustworthy, emphasizing rigorous testing and fairness assessments.

State Space Models (SSM)

  • The Illusion of State in State-Space Models: Challenges the benefits of SSMs over transformers in tracking sequential states, suggesting similar limitations in expressive power →read the paper

  • State Space Model for New-Generation Network Alternative to Transformers: A Survey: Reviews the SSM as a potential efficient alternative to transformers, detailing its applications and performance benefits →read the paper

  • Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation: Utilizes SSMs in a novel network for robust multi-modal semantic segmentation →read the paper

Retrieval-Augmented Generation (RAG)

  • A Survey on Retrieval-Augmented Text Generation for Large Language Models: Surveys RAG systems enhancing LLMs by dynamically incorporating external information to improve performance and mitigate limitations →read the paper

  • How Faithful are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs’ Internal Prior: Investigates the reliability of RAG models in adhering to accurate versus incorrect retrieved information →read the paper

Benchmarking Advances

  • The Open Medical-LLM Leaderboard: Develops a benchmark for evaluating LLMs on medical tasks to enhance reliability and outcomes in healthcare applications →read the paper

  • Introducing v0.5 of the AI Safety Benchmark: Introduces a benchmark to assess AI safety risks in chat-tuned LLMs across various hazard categories →read the paper

  • BLINK: Multimodal Large Language Models Can See but Not Perceive: Establishes a benchmark for evaluating the visual perception capabilities of multimodal LLMs →read the paper

  • RULER: What’s the Real Context Size of Your Long-Context Language Models?: Creates a benchmark to evaluate the effective context size handled by long-context LLMs →read the paper

  • OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments: Sets up a benchmark for multimodal agents performing computer tasks in real environments, revealing limitations in current models →read the paper

Advances in Learning Techniques

  • Many-Shot In-Context Learning: Examines the expansion of in-context learning from few-shot to many-shot, significantly improving performance on complex reasoning tasks →read the paper

  • LLM In-Context Recall is Prompt Dependent: Investigates how LLMs' ability to recall specific information is influenced by the nature of the prompts used →read the paper

  • SAMMO: A General-Purpose Framework for Prompt Optimization: Develops a framework for optimizing LLM prompts by considering their structural aspects to improve accuracy across various tasks →read the paper

  • Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment: Explores using reward models across different languages to align LLMs without additional language-specific training →read the paper

  • TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding: Introduces a method to accelerate the generation of long sequences by employing hierarchical speculative decoding →read the paper

  • Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing: Proposes a method for LLMs to autonomously improve their reasoning through imaginative and critical thinking strategies. →read the paper

  • Scaling Instructable Agents Across Many Simulated Worlds: Focuses on creating AI agents capable of interpreting and executing language instructions in diverse 3D environments →read the paper

  • Stream of Search (SoS): Learning to Search in Language: Develops a new approach to enhancing LLMs' search capabilities by training on optimal and suboptimal search paths →read the paper

Optimization and Efficiency

  • On Speculative Decoding for Multimodal Large Language Models: Examines speculative decoding to improve inference speed in multimodal LLMs without losing accuracy →read the paper

  • TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models: Develops a multimodal LLM optimized for document-oriented tasks, enhancing efficiency and perception →read the paper

  • TransformerFAM: Feedback Attention Is Working Memory: Introduces a new Transformer architecture that includes a feedback loop to act as a working memory →read the paper

  • Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies: Investigates how to adapt the CLIP model for lower computational budgets, focusing on data and architectural efficiencies →read the paper

  • Pre-training Small Base LMs with Fewer Tokens: Describes a method for training smaller LMs using a fraction of the data and resources typically required →read the paper

  • Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences: Introduces a scalable method for enhancing LLMs post-training using general preferences →read the paper

Specialized Applications and Assessments

  • CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues: Aims to train LLMs to maintain focus on relevant dialogue topics using a specialized dataset →read the paper

  • Social Skill Training with Large Language Models: Uses LLMs for enhancing training in social skills through realistic simulations and feedback →read the paper

  • OmniFusion Technical Report: Develops a multimodal architecture that integrates text and visual data, showcasing improved performance across several benchmarks →read the paper

  • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders: Demonstrates converting decoder-only LLMs into efficient text encoders, achieving top performance benchmarks →read the paper

  • LLoCO: Learning Long Contexts Offline: Enhances LLMs' handling of extended contexts by combining context compression and efficient fine-tuning →read the paper

  • MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies: Shows that small LLMs can match the performance of larger models through optimized training →read the paper

  • CodecLM: Aligning Language Models with Tailored Synthetic Data: Introduces a framework for generating synthetic data to better align LLMs with specific tasks →read the paper

  • SambaLingo: Teaching Large Language Models New Languages: Adapts pre-trained LLMs to new languages, improving performance in multilingual contexts →read the paper

  • Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models: Investigates LLMs' ability to memorize and learn from tabular data, assessing the impact on performance →read the paper

  • WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents: Develops a web agent that adapts in-context learning for better web interaction, demonstrating advanced performance on benchmarks →read the paper

  • OpenEQA: Embodied Question Answering in the Era of Foundation Models: Introduces a benchmark for embodied question answering, challenging AI agents to navigate and respond in real-world scenarios →read the paper

If you decide to become a Premium subscriber, remember, that in most cases, you can expense this subscription through your company! Join our community of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading

How did you like it?

Login or Subscribe to participate in polls.

Reply

or to participate.