Activating AI Safety Level 3 protections

blog

Anthropic·anthropic.com/news/activating-asl3-protections

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

This is a landmark policy milestone: the first time Anthropic has moved beyond its baseline ASL-2 protections, making it a key reference for understanding how Responsible Scaling Policies translate into concrete operational decisions as frontier models grow more capable.

Metadata

Importance: 82/100blog postprimary source

Summary

Anthropic announces the precautionary activation of ASL-3 deployment and security standards for Claude Opus 4 under its Responsible Scaling Policy. While not definitively concluding Claude Opus 4 meets the ASL-3 capability threshold, Anthropic determined that ruling out ASL-3-level CBRN risks was no longer possible, prompting proactive implementation of enhanced security measures and targeted deployment restrictions.

Key Points

•ASL-3 is activated provisionally: Anthropic cannot rule out that Claude Opus 4 poses ASL-3-level CBRN risks, though it has ruled out ASL-4 risks.
•ASL-3 Deployment Standard adds narrowly targeted restrictions to reduce CBRN weapons misuse risk, without broadly increasing refusals.
•ASL-3 Security Standard strengthens protections against model weight theft, targeting sophisticated non-state attackers.
•Claude Sonnet 4 was assessed and ruled out for ASL-3, demonstrating the per-model evaluation process under the RSP.
•This marks the first real-world activation of an ASL-3 standard, transitioning the RSP from theoretical framework to operational policy.

Cited by 6 pages

Page	Type	Quality
Anthropic	Organization	74.0
AI Alignment	Approach	91.0
AI Lab Safety Culture	Approach	62.0
Third-Party Model Auditing	Approach	64.0
Sandboxing / Containment	Approach	91.0
Optimistic Alignment Worldview	Concept	91.0

2 FactBase facts citing this source

Entity	Property	Value	As Of
Anthropic	AI Safety Level	ASL-3 (Opus 4+), ASL-2 (Sonnet/Haiku)	May 2025
Anthropic	AI Safety Level	ASL-3 (Opus 4, Opus 4.5, Opus 4.6), ASL-2 (Sonnet, Haiku)	Feb 2026

Cached Content Preview

HTTP 200Fetched Apr 7, 202611 KB

Policy Activating AI Safety Level 3 protections

 May 22, 2025 We have activated the AI Safety Level 3 (ASL-3) Deployment and Security Standards described in Anthropic’s Responsible Scaling Policy (RSP) in conjunction with launching Claude Opus 4. The ASL-3 Security Standard involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons. These measures should not lead Claude to refuse queries except on a very narrow set of topics. 

 

 We are deploying Claude Opus 4 with our ASL-3 measures as a precautionary and provisional action. To be clear, we have not yet determined whether Claude Opus 4 has definitively passed the Capabilities Threshold that requires ASL-3 protections. Rather, due to continued improvements in CBRN-related knowledge and capabilities, we have determined that clearly ruling out ASL-3 risks is not possible for Claude Opus 4 in the way it was for every previous model, and more detailed study is required to conclusively assess the model’s level of risk. (We have ruled out that Claude Opus 4 needs the ASL-4 Standard, as required by our RSP, and, similarly, we have ruled out that Claude Sonnet 4 needs the ASL-3 Standard.) 

 

 Dangerous capability evaluations of AI models are inherently challenging , and as models approach our thresholds of concern, it takes longer to determine their status. Proactively enabling a higher standard of safety and security simplifies model releases while allowing us to learn from experience by iteratively improving our defenses and reducing their impact on users. 

 This post and the accompanying report discuss the new measures and the rationale behind them. 

 

 Background

 Increasingly capable AI models warrant increasingly strong deployment and security protections. This principle is core to Anthropic’s Responsible Scaling Policy (RSP). 1 

 Deployment measures target specific categories of misuse; in particular, our RSP focuses on reducing the risk that models could be misused for attacks with the most dangerous categories of weapons–CBRN.
 Security controls aim to prevent the theft of model weights–the essence of the AI’s intelligence and capability.
 Anthropic’s RSP includes Capability Thresholds for models: if models reach those thresholds (or if we have not yet determined that they are sufficiently far below them), we are required to implement a higher level of AI Safety Level Standards . Until now, all our models have been deployed under the baseline protections of the AI Safety Level 2 (ASL-2) Standard. ASL-2 deployment measures include training models to refuse dangerous CBRN-related requests. ASL-2 security measures include defenses against opportunistic attempts to steal the weights. The ASL-3 Standar

... (truncated, 11 KB total)

Resource ID: 7512ddb574f82249 | Stable ID: sid_nZ0ohtoRUW