Activating AI Safety Level 3 protections
blogCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic
This is a landmark policy milestone: the first time Anthropic has moved beyond its baseline ASL-2 protections, making it a key reference for understanding how Responsible Scaling Policies translate into concrete operational decisions as frontier models grow more capable.
Metadata
Summary
Anthropic announces the precautionary activation of ASL-3 deployment and security standards for Claude Opus 4 under its Responsible Scaling Policy. While not definitively concluding Claude Opus 4 meets the ASL-3 capability threshold, Anthropic determined that ruling out ASL-3-level CBRN risks was no longer possible, prompting proactive implementation of enhanced security measures and targeted deployment restrictions.
Key Points
- •ASL-3 is activated provisionally: Anthropic cannot rule out that Claude Opus 4 poses ASL-3-level CBRN risks, though it has ruled out ASL-4 risks.
- •ASL-3 Deployment Standard adds narrowly targeted restrictions to reduce CBRN weapons misuse risk, without broadly increasing refusals.
- •ASL-3 Security Standard strengthens protections against model weight theft, targeting sophisticated non-state attackers.
- •Claude Sonnet 4 was assessed and ruled out for ASL-3, demonstrating the per-model evaluation process under the RSP.
- •This marks the first real-world activation of an ASL-3 standard, transitioning the RSP from theoretical framework to operational policy.
Cited by 6 pages
| Page | Type | Quality |
|---|---|---|
| Anthropic | Organization | 74.0 |
| AI Alignment | Approach | 91.0 |
| AI Lab Safety Culture | Approach | 62.0 |
| Third-Party Model Auditing | Approach | 64.0 |
| Sandboxing / Containment | Approach | 91.0 |
| Optimistic Alignment Worldview | Concept | 91.0 |
2 FactBase facts citing this source
| Entity | Property | Value | As Of |
|---|---|---|---|
| Anthropic | AI Safety Level | ASL-3 (Opus 4+), ASL-2 (Sonnet/Haiku) | May 2025 |
| Anthropic | AI Safety Level | ASL-3 (Opus 4, Opus 4.5, Opus 4.6), ASL-2 (Sonnet, Haiku) | Feb 2026 |
Cached Content Preview
Policy
# Activating AI Safety Level 3 protections
May 22, 2025

_We have activated the AI Safety Level 3 (ASL-3) Deployment and Security Standards described in Anthropic’s Responsible Scaling Policy (RSP) in conjunction with launching Claude Opus 4. The ASL-3 Security Standard involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons. These measures should not lead Claude to refuse queries except on a very narrow set of topics._
_We are deploying Claude Opus 4 with our ASL-3 measures as a precautionary and provisional action. To be clear, we have not yet determined whether Claude Opus 4 has definitively passed the Capabilities Threshold that requires ASL-3 protections. Rather, due to continued improvements in CBRN-related knowledge and capabilities, we have determined that clearly ruling out ASL-3 risks is not possible for Claude Opus 4 in the way it was for every previous model, and more detailed study is required to conclusively assess the model’s level of risk. (We have ruled out that Claude Opus 4 needs the ASL-4 Standard, as required by our RSP, and, similarly, we have ruled out that Claude Sonnet 4 needs the ASL-3 Standard.)_
_Dangerous capability evaluations of AI models are inherently [challenging](https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team), and as models approach our thresholds of concern, it takes longer to determine their status. Proactively enabling a higher standard of safety and security simplifies model releases while allowing us to learn from experience by iteratively improving our defenses and reducing their impact on users._
_This post and the accompanying [report](http://anthropic.com/activating-asl3-report) discuss the new measures and the rationale behind them._
## Background
Increasingly capable AI models warrant increasingly strong deployment and security protections. This principle is core to Anthropic’s [Responsible Scaling Policy](https://www-cdn.anthropic.com/872c653b2d0501d6ab44cf87f43e1dc4853e4d37.pdf) (RSP).1
- **Deployment measures** target specific categories of misuse; in particular, our RSP focuses on reducing the risk that models could be misused for attacks with the most dangerous categories of weapons–CBRN.
- **Security** **controls** aim to prevent the theft of model weights–the essence of the AI’s intelligence and capability.
Anthropic’s RSP includes _Capability Thresholds_ for models: if models reach those thresholds (or if we have not yet determined that they are sufficiently far below the
... (truncated, 13 KB total)7512ddb574f82249 | Stable ID: YmFkZjUwMz