13 Foundational Types of AI Models

Let’s refresh some fundamentals today to stay fluent in the what we all already work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):

LLM – Large Language Model (GPT, LLaMA)
It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.) → Read more
plus → The history of LLM
SLM – Small Language Model (TinyLLaMA, Phi models, SmolLM)
Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs → Read more
VLM – Vision-Language Model (CLIP, Flamingo)
Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both → Read more
MLLM – Multimodal Large Language Model (Gemini)
A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc. → Read more
VLA – Vision-Language-Action Model (Gemini Robotics, Rho-alpha, SmolVLA)
Models that connect perception and language directly to actions. VLAs take visual and textual inputs and output action commands, often for embodied agents like robots. They are commonly used in robotics and embodied AI to ground perception into real-world actions. → Read more
Our recent AI 101 episode covering the illustrative VLA landscape
LAM – Large Action Model (InstructDiffusion, RT-2)
Action-centric models trained to plan and generate sequences of actions rather than just text. Actions can be physical (robot control) or digital (tool calls, UI actions, API usage). LAMs emphasize scalable decision-making, long-horizon planning, and generalization across tasks, and may or may not include vision as an input. → Read more
So here is the difference between VLAs and LAMs: VLAs focus on turning vision and language into physical actions, while LAMs focus more broadly on planning and executing action sequences, often in digital or tool-based environments.
RLM – Reasoning Language Model (DeepSeek-R1, OpenAI's o3)
Advanced AI systems specifically optimized for multi-step logical reasoning, complex problem-solving, and structured thinking. LRMs incorporate test-time scaling, Chain-of-Thought reasoning, tool use, external memory, strong math and code capabilities, and more modular design for reliable decision-making. → Read more
We’ve also covered them here.
MoE – Mixture of Experts (e.g. Mixtral)
Uses many sub-networks called experts, but activates only a few per input, enabling massive scaling with sparse computation → Read more
SSM – State Space Model (Mamba, RetNet)
A neural network that defines the sequence as a continuous dynamical system, modeling how hidden state vectors change in response to inputs over time. SSMs are parallelizable and efficient for long contexts → Read more
+our overview of SSMs and Mamba
RNN – Recurrent Neural Network (advanced variants: LSTM, GRU)
Processes sequences one step at a time, passing information through a hidden state that acts as memory. RNNs were widely used in early NLP and time-series tasks but struggle with long-range dependencies compared to newer architectures → Read more
Our detailed article about LSTM
CNN – Convolutional Neural Network (MobileNet, EfficientNet)
Automatically learns patterns from visual data. It uses convolutional layers to detect features like edges, textures, or shapes. Not so popular now, but still used in edge applications and visual processing. → Read more
SAM – Segment Anything Model (developed by Meta AI)
A foundation model trained on over 1 billion segmentation masks. Given a prompt (like a point or box), it segments the relevant object. → Read more
LNN – Liquid Neural Network (LFMs - Liquid Foundation Models by Liquid AI)
LNNs use differential equations to model neuronal dynamics to adapt their behavior in real-time. They continuously update their internal state, which is great for time-series data, robotics, and real-world decision making. → Read more
More about LFMs in our AI 101 episode

Also, subscribe to our X, Threads and BlueSky

to get unique content on every social media

13 Foundational Types of AI Models

Reply

Keep Reading

Turing Post