One of the main reasons to be cautious about AI is the range of harms it can produce β from malicious use in attacks, fraud, or disinformation to harmful errors and hallucinations in benign responses. Just as we are not fully protected from criminals in everyday life, neither is AI. The solution is an added safety layer: specialized guardian models trained to detect and filter unsafe prompts and outputs. These βAI soldiersβ are designed to make the ecosystem more secure and reliable.
In 2025, guardian models arenβt niche experiments β theyβre built into every serious AI deployment. OpenAI has its moderation layers, Microsoft offers Azure Content Safety, Meta has shipped Llama Guard since 2023, IBM provides Granite Guardian, and startups are releasing their own open-source versions. So guardian models are quietly powering almost every chatbot or generative AI you use β without most people even realizing it.
We donβt talk much about AI safety in our AI 101 series, but today is finally the day for it. We are going to dive into the basic definitions of guardian models (since there can be some uncertainties), explore the main guardian models used today, and look at the newest dynamic approach, DynaGuard, that can enforce any set of rules you give it at runtime.
Guardian models may seem similar to typical LLMs, but thatβs only at first glance. And you definitely need to know about them to save your models and protect yourself.
In todayβs episode, we will cover:
Guardian or guardrail?
Categories of risky AI content
Llama Guard and ShieldGemma β the multimodal guardian baselines
Granite Guardian: Fighting harmful risks and RAG hallucinations
DynaGuard: Orientation on user rules
Other guardian and guardrail systems
Conclusion
Sources and further reading
Guardian or guardrail?
The main rule goal for guardian models is to effectively enforce content policies and rules in real time. They run alongside the primary model, monitoring inputs and outputs in real time to catch harmful or policy-violating content. So you donβt need to hard-code every rule in the primary model.
In one of our interviews, Amr Awadallah, founder and CEO of Vectara, emphasized the importance of a separate safety and verification component that oversees the model β a guardian agent, that monitors LLM outputs to catch hallucinations and triggers human-in-the-loop when risk is high.
This βAI supervising AIβ loop is a cool shift, as it turns models into generators and regulators at the same time.
Guardian models also donβt need to be massive-scale systems. They are usually comparatively small (2B-8B parameters), but still manage to reliably catch harmful content across dozens of risk categories.
Now to the differences in terms.
Youβve heard βguardrailsβ and you might have heard βguardian model.β Are they the same? Theyβre closely related and often used interchangeably, since both enforce safety rules on AI systems. But in practice, a guardian model is one way to implement guardrails, while βguardrailβ is the broader concept. Guardrails usually mean the full toolkit of measures β rules, filters, or models β that keep AI behavior in check.
Importantly, guardian models go beyond simple filtering. They can:
Serve as guardrails to block harmful content in real time
Act as evaluators to check the quality of generated responses
Strengthen RAG pipelines by detecting hallucinations and verifying whether answers are relevant, grounded, and accurate
Categories of risky AI content

RAG hallucination risks appear because RAG can still βhallucinateβ if the retrieved context is irrelevant or conflicting.

Some guardian models like Llama Guard and ShieldGemma control only harmful content, while others, IBM's Granite Guard, for example, mitigate RAG issues as well. Letβs dig into the tech specs of these AI βguardsβ to see how theyβre built and how they really work β
Llama Guard and ShieldGemma β the multimodal guardian baselines
Below we will discuss Llama Guard, ShieldGemma, Granite Guardian, DynaGuard, and overview OpenAIβs Moderation API, Microsoft Azureβs Content Safety, NVIDIA NeMo Guardrails, WildGuard, GUARDIAN, and MCP Guardian.
Join Premium members from top companies like Microsoft, Google, Hugging Face, a16z, Datadog plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand whatβs going on with AI.Β Simplify your learning journey ππΌ


