The Rise of Self-Evolving Language Models

Next Week in Turing Post:

Wednesday, Computer Vision Series: Recognizing smiles and other practical developments in the 1990s and early 2000s.
Friday: profile of an AI Unicorn

If you like Turing Post, consider becoming a paid subscriber. That way you help us keep going. This week we celebrate our 50th FOD! Enjoy your 50% off for all Premium features (FOD itself is always free) →

Upgrade for only $35/YEAR

Large language models (LLMs) have made astonishing advancements, but their evolution has traditionally relied heavily on external datasets and human guidance. A fascinating shift is underway: the emergence of self-evolving LLMs. This groundbreaking concept is the focus of significant research efforts aimed at pushing LLMs toward a new level of autonomy and intelligence.

Researchers from Peking University, Alibaba Group, and Nanyang Technological University have proposed a comprehensive framework for understanding this evolution (A Survey on Self-Evolution of Large Language Models). The framework outlines a cyclical process consisting of experience acquisition, refinement, updating, and evaluation. At the core of this process is the ability of LLMs to learn from their own experiences and improve their capabilities – a mode of learning inspired by the way humans grow and develop knowledge and skills.

Techniques for Self-Improvement

Several innovative techniques are propelling this self-evolutionary trend, they all have been published just recently:

Imagination, Search, and Criticism: LLMs can enhance their reasoning processes by developing imaginative and critical thinking skills through targeted techniques (Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing: Proposes a method for LLMs to autonomously improve their reasoning through imaginative and critical thinking strategies).
Self-Play and Reinforcement Learning: Researchers have designed adversarial language games where LLMs play different roles to simulate challenging scenarios (Self-playing Adversarial Language Game Enhances LLM Reasoning). Through reinforcement learning based on game outcomes, LLMs can refine and advance their reasoning abilities, demonstrating significant improvements in various reasoning tasks.
Optimizing Inference and Decoding: The LayerSkip framework allows LLMs to perform computationally lighter inferences (LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding). It introduces early exits during the decoding process but maintains accuracy while reducing memory and computational requirements.
Reasoning about Code Execution: LLMs can be trained to understand and reason about program execution through the NExT method (NExT: Teaching Large Language Models to Reason about Code Execution). NExT uses self-training to create a synthetic dataset of execution-aware rationales that improve the reasoning capabilities of LLMs, demonstrated by a 26.1% absolute improvement in the program fix rate on Mbpp and 14.3% on HumanEval, even when traces are not available at test time.

Exploring LLM Values and Ethical Alignment

It’s also remarkable, that LLMs are beginning to develop their own value systems. The ValueLex framework, developed by researchers from Tsinghua University and Microsoft Research Asia, aims to uncover these unique values of LLMs, distinct from human norms. By carefully analyzing LLMs, researchers have discovered value systems with dimensions like competence, character, and integrity. This line of research is crucial for understanding how model design influences value development and ultimately guides ethical considerations in AI development.

The Future of Self-Evolving Systems

The prospect of self-evolving LLMs is both exciting and filled with questions. As these models gain autonomy, their continued alignment with human goals and values will become crucial. Continuous research, interdisciplinary collaboration, and rigorous evaluation will be essential to unlocking the full potential of self-evolving LLMs and ensuring their safe and beneficial integration into our world.

It’s also might be a good time to reread John Von Neumann’s Theory of Self-Reproducing Automata…

Twitter Library

A Comprehensive List of Resources to Understand Multimodal Models

An Overview of Current Surveys, Models, and Tools in Multimodal AI Research

www.turingpost.com/p/multimodal-resources

Last Week Models from the US

(Every week now brings new, powerful models. Last week was especially fruitful. Here is our list of models with additional reading recommendations.)

Phi-3 Mini - Developed by Microsoft
Phi-3 Mini, a 3.8 billion parameter model by Microsoft, matches the performance of larger models while being optimized for mobile devices. Trained on a highly curated mix of web and synthetic data, it supports advanced language processing locally on your phone →read the paper
- Additional reading: Compare Llama-3 and Phi-3 using RAG (lightning.ai)
OpenELM - Developed by Apple
Apple's OpenELM utilizes a novel layer-wise scaling strategy to efficiently allocate parameters within its architecture, reducing pre-training tokens by half and improving accuracy by 2.36% over similar models. The open-source framework facilitates transparent, reproducible research in natural language processing →read the paper
Snowflake Arctic - Developed by Snowflake AI Research
Snowflake Arctic is tailored for enterprise applications, utilizing a Dense-MoE Hybrid transformer architecture to dramatically cut costs and compute resources. It excels in tasks like SQL generation and coding, and is fully open-source, available on multiple platforms →read the paper
- Additional reading: Snowflake's Mission: Demolishing Data Limitations in the Era of Enterprise AI
Pegasus-1 - Developed by Twelve Labs
Pegasus-1 is a multimodal LLM designed for video understanding, interpreting spatiotemporal data to enhance comprehension across various video types. It excels in tasks like video conversation and summarization, offering insights into its architecture and capabilities →read the paper

Models from China:

SenseNova 5.0 - Developed by SenseTime
SenseNova 5.0, unveiled on April 24, 2024, in Shanghai, is a major update to SenseTime's large model series. This iteration features enhancements in linguistic, creative, and scientific capabilities and introduces multimodal interactions with over 10TB of token data and supports a 200K context window, enhancing performance in knowledge, math, reasoning, and coding. But the main thing about SenseNova 5.0 is that it matches or exceeds the capabilities of models like GPT-4 Turbo across various benchmarks →more details
Tele-FLM - Developed by Beijing Academy of AI and Institute of AI of China Telecom Corp Ltd
Tele-FLM, a 52-billion parameter multilingual LLM, is optimized for factual judgment and low carbon footprint. It provides detailed insights into model design and training dynamics, achieving competitive performance →read the paper
InternVL 1.5 - Developed by Shanghai AI Laboratory
InternVL 1.5 aims to bridge the gap to commercial multimodal models, featuring a robust vision encoder and high-quality bilingual dataset. It shows competitive results in OCR and Chinese-related tasks, advancing the open-source sector →read the paper

News from The Usual Suspects ©

Hugging Face’s FineWeb:

— # (#)

Meta: Meta’s executive were left out of the Artificial Intelligence Safety and Security Board

Cohere: Cohere has launched a toolkit designed to simplify AI application development across various platforms, emphasizing ease of use and customization.

Meta and Cohere (and a few other notable institutions) also participated in creating the PRISM dataset. It offers groundbreaking insights into how diverse global participants interact with large language models (LLMs). Developed by a collaboration of international researchers and institutions, PRISM links detailed survey responses with conversation transcripts to analyze and understand user demographics, preferences, and feedback on AI interactions. This dataset highlights the importance of personal and cultural diversity in shaping AI systems and user experiences, demonstrating the nuanced interplay between AI and its human users →read the paper and →check the dataset

OpenAI's Memory Upgrade: OpenAI has introduced a memory feature for ChatGPT, allowing the AI to maintain context over conversations, potentially enriching user interaction and utility.

The freshest research papers, categorized for your convenience

Enhancements to Large Language Models (LLMs)

The Instruction Hierarchy: Proposes a hierarchical approach to prioritize trusted instructions in LLMs, enhancing security against adversarial attacks by discriminating against malicious inputs →read the paper
Multi-Head Mixture-of-Experts: Introduces a refined architecture for Mixture-of-Experts models, enhancing the activation and analytical capabilities of LLMs across varied tasks →read the paper
AdvPrompter: Develops a fast method for generating adversarial prompts to test and improve the robustness of LLMs against potential misuse →read the paper
SnapKV: Focuses on optimizing key-value caches for LLMs to enhance memory and time efficiency, thereby improving long-input processing →read the paper
XC-CACHE: Enhances efficient inference in LLMs by using cached context with cross-attention, reducing memory requirements substantially →read the paper
Make Your LLM Fully Utilize the Context: Introduces a training method that significantly improves LLMs' ability to utilize long context effectively →read the paper

Multimodal and Visual Model Advancements

Graph Machine Learning in the Era of Large Language Models: Discusses the integration of LLMs with Graph Neural Networks to enhance both technologies →read the paper
A Multimodal Automated Interpretability Agent: Develops an agent that integrates a vision-language model with tools for automated experiments, enhancing interpretability of neural models →read the paper
List Items One by One: Proposes a new training paradigm for multimodal LLMs that improves their visual reasoning by training them to enumerate visual tags →read the paper

Generative and Rendering Technologies

NeRF-XL: Advances Neural Radiance Fields by scaling up the model to operate across multiple GPUs, enhancing rendering quality and efficiency →read the paper

Benchmarks and Evaluation

Revisiting Text-to-Image Evaluation with Gecko: Evaluates text-to-image models by introducing a new benchmark that assesses model performance across detailed and varied prompts →read the paper
SEED-Bench-2-Plus: Introduces a benchmark for evaluating MLLMs on visual comprehension tasks that involve text-rich images, aiming to guide future enhancements in MLLM capabilities →read the paper

In other newsletters:

We love history, here is one about A History of the Chinese Computer

We are watching

If you decide to become a Premium subscriber, remember, that in most cases, you can expense this subscription through your company! Join our community of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading

Become Premium with 50% OFF