Anthropic Responsible Scaling Policy (Version 2.2)
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic
Anthropic's RSP is one of the most detailed public industry safety commitments and a key reference for AI governance discussions; it directly influenced voluntary commitments at the 2023 White House AI summit and subsequent frontier lab safety frameworks.
Metadata
Summary
Anthropic's Responsible Scaling Policy (RSP) is a formal commitment outlining how the company will evaluate AI systems for dangerous capabilities and what safety measures must be in place before developing or deploying more powerful models. It establishes AI Safety Levels (ASLs) analogous to biosafety levels, with specific thresholds and required countermeasures for each level. Version 2.2 represents an iterative update to this framework as Anthropic's models advance.
Key Points
- •Defines AI Safety Levels (ASL-1 through ASL-4+) with specific capability thresholds that trigger mandatory safety protocols before further development or deployment.
- •Requires regular 'frontier model evaluations' to assess whether models cross capability thresholds for CBRN weapons uplift, autonomous replication, or undermining oversight.
- •Establishes binding commitments: if a model reaches an ASL threshold without required safeguards in place, Anthropic must pause deployment or development.
- •Covers both deployment restrictions (who can access models and under what conditions) and development restrictions (compute, training runs, security requirements).
- •Serves as a public accountability mechanism, intended to be updated as understanding of AI risks evolves and as models become more capable.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Anthropic Core Views | Safety Agenda | 62.0 |
Cached Content Preview
ANTHROP\\C
# Responsible Scaling Policy
Version 2.2
Effective May 14, 2025
# Executive Summary
In September 2023, we released our Responsible Scaling Policy (RSP), a public commitment not to train or deploy models capable of causing catastrophic harm unless we have implemented safety and security measures that will keep risks below acceptable levels. We are now updating our RSP to account for the lessons we’ve learned over the last year. This updated policy reflects our view that risk governance in this rapidly evolving domain should be proportional, iterative, and exportable.
Background.AI Safety Level Standards (ASL Standards)are a set of technical and operational measures for safely training and deploying frontier AI models. These currently fall into two categories: Deployment Standards and Security Standards. As model capabilities increase, so will the need for stronger safeguards, which are captured in successively higher ASL Standards. At present, all of our models must meet the ASL-2 Deployment and Security Standards. To determine when a model has become sufficiently advanced such that its deployment and security measures should be strengthened, we use the concepts of Capability Thresholds and Required Safeguards. A Capability Threshold tells uswhenwe need to upgrade our protections, and the corresponding Required Safeguards tell uswhat standardshould apply.
Capability Thresholds and Required Safeguards.TheRequired Safeguards for each Capability Threshold are intended to mitigate risk to acceptable levels. This update to our RSP provides specifications for Capabilities Thresholds related to Chemical, Biological, Radiological, and Nuclear (CBRN) weapons and Autonomous AI Research and Development (AI R&D) and identifies the corresponding Required Safeguards.
Capability assessment.We will routinely test modelsto determine whether their capabilities fall sufficiently far below the Capability Thresholds such that the ASL-2 Standard remains appropriate. We will first conduct preliminary assessments to determine whether a more comprehensive evaluation is needed. For models requiring comprehensive testing, we will assess whether the model is unlikely to reach any relevant Capability Thresholds absent surprising advances in widely accessible post-training enhancements. If, after the comprehensive testing, we determine that the model is sufficiently below the relevant Capability Thresholds, then we will continue to apply the ASL-2 Standard. If, however, we are unable to make the required showing, we will act as though the model has surpassed the Capability Threshold. This means that we will both upgrade to the ASL-3 Required Safeguards and conduct a follow-up capability assessment to confirm that the ASL-4 Standard is not necessary.
Safeguards assessment.To determine whether the measureswe have adopted satisfy the ASL-3 Required Safeguards, we will conduct a safeguards a
... (truncated, 60 KB total)7ccf80f6837a972a | Stable ID: YjQzY2NkNG