This website uses cookies

Read our Privacy policy and Terms of use for more information.

Let’s refresh some fundamentals today to stay fluent in the what we all already work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):

Language Models

  1. LLM – Large Language Model (GPT, LLaMA)

    It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.) → Large Language Models: A Survey plus → The history of LLM

  2. SLM – Small Language Model (TinyLLaMA, Phi models, SmolLM)

    Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs A Survey of Small Language Models

  3. RLM – Reasoning Language Model (DeepSeek-R1, OpenAI's o3)

    Advanced AI systems specifically optimized for multi-step logical reasoning, complex problem-solving, and structured thinking. LRMs incorporate test-time scaling, Chain-of-Thought reasoning, tool use, external memory, strong math and code capabilities, and more modular design for reliable decision-making. → Large Language Models: A Survey

    Multimodal & Action Models

  4. VLM – Vision-Language Model (CLIP, Flamingo)

    Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both An Introduction to Vision-Language Modeling

  5. MLLM – Multimodal Large Language Model (Gemini)

    A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc. A Survey on Multimodal Large Language Models

  6. VLA – Vision-Language-Action Model (Gemini Robotics, Rho-alpha, SmolVLA)

    Models that connect perception and language directly to actions. VLAs take visual and textual inputs and output action commands, often for embodied agents like robots. They are commonly used in robotics and embodied AI to ground perception into real-world actions. Vision-Language-Action (VLA) Models: Concepts, Progress, Applications and Challenges

  7. LAM – Large Action Model (InstructDiffusion, RT-2)

    Action-centric models trained to plan and generate sequences of actions rather than just text. Actions can be physical (robot control) or digital (tool calls, UI actions, API usage). LAMs emphasize scalable decision-making, long-horizon planning, and generalization across tasks, and may or may not include vision as an input. Large Action Models: From Inception to Implementation

    So here is the difference between VLAs and LAMs: VLAs focus on turning vision and language into physical actions, while LAMs focus more broadly on planning and executing action sequences, often in digital or tool-based environments.

    Architecture Types

  8. MoE – Mixture of Experts (e.g. Mixtral)

    Uses many sub-networks called experts, but activates only a few per input, enabling massive scaling with sparse computation Mixture of Experts: How It Works

  9. SSM – State Space Model (Mamba, RetNet)

    A neural network that defines the sequence as a continuous dynamical system, modeling how hidden state vectors change in response to inputs over time. SSMs are parallelizable and efficient for long contexts Efficiently Modeling Long Sequences with Structured State Spaces

    +our overview What is Mamba?

  10. RNN – Recurrent Neural Network (advanced variants: LSTM, GRU)

    Processes sequences one step at a time, passing information through a hidden state that acts as memory. RNNs were widely used in early NLP and time-series tasks but struggle with long-range dependencies compared to newer architectures Recurrent Neural Networks (RNNs): A gentle Introduction and Overview Our detailed article about LSTM For a modern take on time series forecasting, see TimeGPT — the first foundation model built specifically for this domain.

  11. CNN – Convolutional Neural Network (MobileNet, EfficientNet)

    Automatically learns patterns from visual data. It uses convolutional layers to detect features like edges, textures, or shapes. Not so popular now, but still used in edge applications and visual processing. An Introduction to Convolutional Neural Networks

  12. SAM – Segment Anything Model (developed by Meta AI)

    A foundation model trained on over 1 billion segmentation masks. Given a prompt (like a point or box), it segments the relevant object. Segment Anything

  13. LNN – Liquid Neural Network (LFMs - Liquid Foundation Models by Liquid AI)

    LNNs use differential equations to model neuronal dynamics to adapt their behavior in real-time. They continuously update their internal state, which is great for time-series data, robotics, and real-world decision making. Liquid Time-constant Networks

    More about LFMs in our AI 101 episode Can Liquid Models Beat Transformers?

Also, subscribe to our X, Threads and BlueSky

to get unique content on every social media

Reply

Avatar

or to participate

Keep Reading