LLM Security Tools: 10 Open-Source Frameworks & Guardrails

Q: What are LLM security tools?

LLM security tools are software frameworks, scanners, classifiers, benchmarks, and guardrails designed to protect AI systems from attacks and failures. They help detect prompt injection, jailbreaks, data leakage, unsafe outputs, adversarial ML behavior, dangerous tool use, and other risks in LLM apps and AI agents.

Q: What is the best open-source LLM security tool?

There is no single best tool for every use case. For guardrails, NVIDIA NeMo Guardrails is a strong option. For vulnerability scanning, garak is useful. For red teaming and CI/CD testing, Promptfoo and DeepTeam are practical choices. For prompt injection detection, Llama Prompt Guard is a focused classifier.

Q: How do I protect against prompt injection?

Prompt injection protection usually requires several layers: input scanning, prompt-injection classifiers, strict system prompts, tool-use permissions, runtime guardrails, retrieval filtering, output validation, and continuous red teaming. For agentic systems, it is also important to monitor tool calls and block dangerous actions before execution.

Q: Is Guardrails AI open source?

Guardrails AI is open source and is often used to validate and structure LLM outputs. However, it is different from NVIDIA NeMo Guardrails. Both belong to the broader guardrails category, but they focus on different parts of the LLM safety and reliability stack.

Q: What is the difference between LLM guardrails and red teaming?

Guardrails are runtime protections that try to prevent unsafe behavior while an AI system is being used. Red teaming is the process of actively attacking or stress-testing the model to find weaknesses before users or adversaries do. Good LLM security usually needs both.

While one of the most advanced defensive models – Anthropic’s Mythos (we discuss it in this video) – remains closed to the public, an important thing becomes clear: AI security can no longer be treated as a nice-to-have layer added late in deployment. People need real defensive AI, and need it now. And while Mythos remains unavailable, we think that you need various open alternatives to it.

The 11 open tools, frameworks, and models below cover different layers of protection: vulnerability scanning, runtime guardrails, prompt injection defense, red teaming, safety classifiers, data leakage prevention, and benchmarks for adversarial ML testing.

The easiest way to choose is by task. Use scanners for threat detection before deployment, guardrails for runtime protection, red teaming tools to simulate attacks, classifiers to block unsafe inputs or outputs, and benchmarks to measure whether your system resists prompt injection and other agentic AI risks. These tools also connect naturally to broader AI cybersecurity projects, and the growing Gen AI security project ecosystem.

Now let’s move to the list →

Open-Source LLM Security Tools: Guardrails, Red Teaming & Vulnerability Scanning

Tool	Category	Best for	Threats covered
NVIDIA NeMo Guardrails	Guardrails	Adding programmable runtime controls to LLM apps and AI agents	Prompt injection, jailbreaks, unsafe outputs, tool misuse, policy violations
Promptfoo	Red teaming / vulnerability scanning / benchmark	Automated LLM security testing in CI/CD	Prompt injection, jailbreaks, adversarial prompts, model regressions, unsafe behavior
LLM Guard	Scanning / guardrails	Input-output filtering around LLM calls	Prompt injection, data leakage, secrets exposure, toxic content, unsafe prompts
NVIDIA garak LLM vulnerability scanner	Vulnerability scanner / red teaming	Pre-deployment and regression testing of LLMs	Prompt injection, jailbreaks, hallucination, data leakage, misinformation, toxicity
DeepTeam	Red teaming / guardrails	Simulating attacks against LLM apps and agents	Jailbreaks, prompt injection, multi-turn attacks, PII leakage, bias, SQL injection
Llama Prompt Guard 2-86M	Classifier	Detecting prompt injection and jailbreak attempts	Prompt injection, jailbreaks, instruction override attacks, adversarial prompts
LlamaFirewall	Agent guardrails / threat detection	Real-time security monitoring for AI agents	Prompt injection, indirect injection, agent misalignment, insecure code, risky tool use
ShieldGemma 2	Safety classifier	Filtering unsafe visual content in VLM pipelines	Harmful image content, unsafe multimodal inputs, policy violations in vision-language systems
OpenGuardrails	Guardrails / scanning / runtime monitoring	Real-time protection of autonomous agents	Prompt injection, data leakage, unsafe tool calls, dangerous actions, agent misuse
Cupcake	Guardrails / policy enforcement	Deterministic controls over agent actions	Dangerous tool use, unauthorized actions, risky arguments, policy violations, weak auditability
CyberSecEval 3 – Visual Prompt Injection	Benchmark	Testing multimodal models against visual prompt injection	Visual prompt injection, indirect injection, multimodal adversarial attacks, benchmarked safety failures

NVIDIA NeMo Guardrails

Category: Guardrails
Best for: Adding programmable runtime controls to LLM apps and AI agents

One of the clearest open-source frameworks for putting programmable guardrails between application code and LLMs. It helps developers control model behavior through input, dialog, retrieval, execution, output rails, and defend against common threats such as jailbreaks and prompt injections, which makes it especially relevant for production assistants and agentic systems. → Explore more

Promptfoo

Category: Red teaming / vulnerability scanning / benchmark
Best for: Automated LLM security testing in CI/CD

This a CLI and library sits at the intersection of LLM evals, red teaming, and vulnerability scanning. It turns AI security into something closer to normal software assurance: you can automate checks, compare models, integrate tests into CI/CD, and generate security reports instead of relying on ad hoc manual probing. → Explore more

LLM Guard

Category: Scanning / guardrails
Best for: Input-output filtering around LLM calls

A practical security toolkit from Protect AI for securing LLM interactions. It focuses on sanitization, harmful-language detection, prompt-injection resistance, secret filtering, and data-leak prevention, so it fits teams that want a modular scanner layer before and after model calls. → Explore more

For a technical breakdown of how data leakage works in LLMs, including differential privacy and federated learning, see our data privacy in LLM systems article.

NVIDIA garak LLM vulnerability scanner

Category: Vulnerability scanner / red teaming
Best for: Pre-deployment and regression testing of LLMs

A strong choice for teams that want an LLM vulnerability scanner rather than only runtime filtering. Garak probes models for failure modes such as hallucination, prompt injection, data leakage, misinformation, toxicity, and jailbreak susceptibility, making it one of the most useful open tools for structured pre-deployment and regression testing. → Explore more

DeepTeam

Category: Red teaming / guardrails
Best for: Simulating attacks against LLM apps and agents

A lightweight but serious open-source framework for red teaming LLM systems. It simulates attacks such as jailbreaking, prompt injection, and multi-turn exploitation to surface issues like PII leakage, bias, and SQL injection, and it also includes production-oriented guardrails for real-time input/output protection. → Explore more

Llama Prompt Guard 2-86M

Category: Classifier
Best for: Detecting prompt injection and jailbreak attempts

A compact open-source classifier from Meta designed to detect prompt injection and jailbreak attacks. It classifies prompts as benign or malicious based on whether they attempt to override intended instructions, and can be used as an additional defensive layer alongside larger models to reduce prompt attack risks in LLM pipelines. → Explore more

LlamaFirewall

Category: Agent guardrails / threat detection
Best for: Real-time security monitoring for AI agents

LlamaFirewall is an open-source guardrail system for secure AI agents. It combines PromptGuard 2, agent alignment checks, and CodeShield to monitor prompts, reasoning, and generated code before dangerous actions happen. It is especially relevant for agentic systems that browse, code, call tools, or act on untrusted inputs. → Explore more

ShieldGemma 2

Category: Safety classifier
Best for: Filtering unsafe visual content in VLM pipelines

Note: ShieldGemma 2 is partly / open model

A lightweight safety classifier from Google that detects harmful content in images. It evaluates both synthetic and real images against predefined safety policies (violence, sexual content, dangerous activities) and outputs a probability of policy violation. You can use it as an input or output filtering layer for vision-language models (VLMs). → Explore more

OpenGuardrails

Category: Guardrails / scanning / runtime monitoring
Best for: Real-time protection of autonomous agents.

A more agent-security-focused option that emphasizes real-time defense for autonomous systems. It protects against prompt injection, data leaks, and dangerous actions by combining static scanning of files and configurations with runtime monitoring of agent behavior, tool calls, and LLM interactions, including automatic sanitization of sensitive data before it is sent to models. → Explore more

Cupcake

Category: Guardrails / policy enforcement
Best for: Deterministic controls over agent actions.

Cupcake is interesting because it treats agent security as policy enforcement. It intercepts agent actions and evaluates them against OPA/Rego rules, enabling deterministic controls such as blocking risky actions and tools, restricting dangerous arguments, requiring review, and producing audit trails. → Explore more

CyberSecEval 3 – Visual Prompt Injection

Category: Benchmark
Best for: Testing multimodal models against visual prompt injection.

A multimodal benchmark dataset from Meta’s CyberSecEval 3 suite that evaluates whether models can resist visual prompt injection. It includes structured test cases with embedded instructions, indirect injections, and logic- or security-violating scenarios, enabling systematic assessment of how models respond to visual prompt injection attacks. → Explore more

Subscribe to get it in your inbox

FAQ

What are LLM security tools?

LLM security tools are frameworks, scanners, classifiers, benchmarks, and guardrails that help protect AI systems from attacks and unsafe behavior. They are used to detect prompt injection, jailbreaks, data leakage, unsafe outputs, adversarial ML attacks, dangerous tool use, insecure code, and agent misalignment in LLM apps and AI agents.

What is the best open-source LLM security tool?

There is no single best open-source LLM security tool for every use case. For programmable guardrails, NVIDIA NeMo Guardrails is a strong option. For vulnerability scanning, garak is useful. For CI/CD red teaming, Promptfoo is practical. For prompt injection detection, Llama Prompt Guard works as a focused classifier. For agent security, LlamaFirewall is one of the most relevant newer options.

How do I protect against prompt injection?

Prompt injection protection usually needs several layers: input scanning, prompt-injection classifiers, strict tool permissions, retrieval filtering, output validation, runtime guardrails, and continuous red teaming. For AI agents, the most important extra layer is monitoring actions before execution, especially tool calls, generated code, browser actions, and access to sensitive data.

Is Guardrails AI open source?

Yes, Guardrails AI is open source, but it is not the same as NVIDIA NeMo Guardrails. Guardrails AI focuses mainly on validating and structuring LLM inputs and outputs, while NVIDIA NeMo Guardrails focuses on programmable rails for LLM application behavior, conversations, retrieval, tool use, and runtime safety.

What is the OWASP Agentic AI Top 10?

The OWASP Top 10 for Agentic Applications 2026 is a security framework for autonomous and agentic AI systems. It focuses on risks that appear when AI systems can plan, use tools, call APIs, browse, write code, or act across workflows, including prompt injection, excessive agency, unsafe tool use, data leakage, and weak human oversight.

What is the difference between LLM guardrails and red teaming?

LLM guardrails are runtime defenses that try to prevent unsafe behavior while an AI system is being used. Red teaming is the process of attacking or stress-testing the system before deployment to find weak spots. In practice, strong AI security needs both: red teaming to discover failures, and guardrails to reduce damage in production.

Also, subscribe to our X, Threads and YouTube

to get unique content on every social media

10 Open-Source Tools for LLM Security and AI Guardrails

Open-Source LLM Security Tools: Guardrails, Red Teaming & Vulnerability Scanning

FAQ

Reply

10 Small Language Models to Know in 2026

12 Free Courses to Master Large Language Models in 2026

Whisper Model Explained: OpenAI’s Open-Source Speech Recognition Model in 2026