While one of the most advanced defensive models Anthropic’s Mythos (we discuss it in this video) remains closed to the public, an important thing becomes clear: AI security can no longer be treated as a nice-to-have layer added late in deployment. People need real defensive AI, and need it now. Modern models can find and exploit complex zero-days at scale, so defenders can have various systems: automated guardrails, continuous red teaming, prompt-injection defenses, and faster response loops. And while Mythos remains unavailable, we think that you need various open alternatives to it.

Here are open tools, frameworks, and models to help keep your AI systems safe:

  1. NVIDIA NeMo Guardrails
    One of the clearest open-source frameworks for putting programmable guardrails between application code and LLMs. It helps developers control model behavior through input, dialog, retrieval, execution, output rails, and defend against common threats such as jailbreaks and prompt injections, which makes it especially relevant for production assistants and agentic systems. → Explore more

  2. Promptfoo
    This a CLI and library sits at the intersection of LLM evals, red teaming, and vulnerability scanning. It turns AI security into something closer to normal software assurance: you can automate checks, compare models, integrate tests into CI/CD, and generate security reports instead of relying on ad hoc manual probing. → Explore more

  3. LLM Guard
    A practical security toolkit from Protect AI for securing LLM interactions. It focuses on sanitization, harmful-language detection, prompt-injection resistance, secret filtering, and data-leak prevention, so it fits teams that want a modular scanner layer before and after model calls. → Explore more

  4. NVIDIA garak LLM vulnerability scanner

    A strong choice for teams that want an LLM vulnerability scanner rather than only runtime filtering. Garak probes models for failure modes such as hallucination, prompt injection, data leakage, misinformation, toxicity, and jailbreak susceptibility, making it one of the most useful open tools for structured pre-deployment and regression testing. → Explore more

  5. DeepTeam
    A lightweight but serious open-source framework for red teaming LLM systems. It simulates attacks such as jailbreaking, prompt injection, and multi-turn exploitation to surface issues like PII leakage, bias, and SQL injection, and it also includes production-oriented guardrails for real-time input/output protection. → Explore more

  6. Llama Prompt Guard 2-86M
    A compact open-source classifier from Meta designed to detect prompt injection and jailbreak attacks. It classifies prompts as benign or malicious based on whether they attempt to override intended instructions, and can be used as an additional defensive layer alongside larger models to reduce prompt attack risks in LLM pipelines. → Explore more

  7. ShieldGemma 2
    A lightweight safety classifier from Google that detects harmful content in images. It evaluates both synthetic and real images against predefined safety policies (violence, sexual content, dangerous activities) and outputs a probability of policy violation. You can use it as an input or output filtering layer for vision-language models (VLMs). → Explore more

  8. OpenGuardrails
    A more agent-security-focused option that emphasizes real-time defense for autonomous systems. It protects against prompt injection, data leaks, and dangerous actions by combining static scanning of files and configurations with runtime monitoring of agent behavior, tool calls, and LLM interactions, including automatic sanitization of sensitive data before it is sent to models. → Explore more

  9. Cupcake
    Cupcake is interesting because it treats agent security as policy enforcement. It intercepts agent actions and evaluates them against OPA/Rego rules, enabling deterministic controls such as blocking risky actions and tools, restricting dangerous arguments, requiring review, and producing audit trails. → Explore more

  10. CyberSecEval 3 – Visual Prompt Injection
    A multimodal benchmark dataset from Meta’s CyberSecEval 3 suite that evaluates whether models can resist visual prompt injection. It includes structured test cases with embedded instructions, indirect injections, and logic- or security-violating scenarios, enabling systematic assessment of how models respond to visual prompt injection attacks. → Explore more

Also, subscribe to our X, Threads and YouTube

to get unique content on every social media

Reply

Avatar

or to participate

Keep Reading