🤗 Models on Hugging Face | Blog | Website | CyberSec Eval Paper | Llama Guard Paper
Purple Llama is an umbrella project that over time will bring together tools and evals to help the community build responsibly with open generative AI models. The initial release will include tools and evals for Cyber Security and Input/Output safeguards but we plan to contribute more in the near future.
Borrowing a concept from the cybersecurity world, we believe that to truly mitigate the challenges which generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks and the same ethos applies to generative AI and hence our investment in Purple Llama will be comprehensive.
Components within the Purple Llama project will be licensed permissively enabling both research and commercial usage. We believe this is a major step towards enabling community collaboration and standardizing the development and usage of trust and safety tools for generative AI development. More concretely evals and benchmarks are licensed under the MIT license while any models use the Llama 2 Community license. See the table below:
Component Type | Components | License |
---|---|---|
Evals/Benchmarks | Cyber Security Eval (others to come) | MIT |
Models | Llama Guard | Llama 2 Community License |
Models | Llama Guard 2 | Llama 3 Community License |
Safeguard | Code Shield | MIT |
CyberSec Eval v1 was what we believe was the first industry-wide set of cybersecurity safety evaluations for LLMs. These benchmarks are based on industry guidance and standards (e.g., CWE and MITRE ATT&CK) and built in collaboration with our security subject matter experts. We aim to provide tools that will help address some risks outlined in the White House commitments on developing responsible AI, including:
- Metrics for quantifying LLM cybersecurity risks.
- Tools to evaluate the frequency of insecure code suggestions.
- Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyberattacks.
We believe these tools will reduce the frequency of LLMs suggesting insecure AI-generated code and reduce their helpfulness to cyber adversaries. Our initial results show that there are meaningful cybersecurity risks for LLMs, both with recommending insecure code and for complying with malicious requests. See our Cybersec Eval paper for more details.
CyberSec Eval 2 expands on its predecessor by measuring an LLM’s propensity to abuse a code interpreter, offensive cybersecurity capabilities, and susceptibility to prompt injection. You can read the paper here.
You can also check out the 🤗 leaderboard here.
As we outlined in Llama 3’s Responsible Use Guide, we recommend that all inputs and outputs to the LLM be checked and filtered in accordance with content guidelines appropriate to the application.
To support this, and empower the community, we released Llama Guard, an openly-available model that performs competitively on common open benchmarks and provides developers with a pretrained model to help defend against generating potentially risky outputs. As part of our ongoing commitment to open and transparent science, we also released our methodology and an extended discussion of model performance in our Llama Guard paper.
We are happy to share an updated version, Meta Llama Guard 2. Llama Guard 2 was optimized to support the newly announced policy published by MLCommons, expanding its coverage to a more comprehensive set of safety categories, out-of-the-box.
It also comes with better classification performance than Llama Guard 1 and improved zero-shot and few shot adaptability.
Ultimately, our vision is to enable developers to customize this model to support relevant use cases and to make it easier to adopt best practices and improve the open ecosystem.
Code Shield adds support for inference-time filtering of insecure code produced by LLMs. Code Shield offers mitigation of insecure code suggestions risk, code interpreter abuse prevention, and secure command execution. CodeShield Example Notebook.
To get started and learn how to use Purple Llama components with Llama models, see the getting started guide here. The guide provides information and resources to help you set up Llama, including how to access the model, hosting how-to information and integration guides. Additionally, you will find supplemental materials to further assist you while responsibly building with Llama. The guide will be updated as more Purple Llama components get released.
For a running list of frequently asked questions, for not only Purple Llama components but also generally for Llama models, see the FAQ here.
See the CONTRIBUTING file for how to help out.