Anthropic Responsible Scaling Policy (Version 2.2)

web

Anthropic·www-cdn.anthropic.com/872c653b2d0501d6ab44cf87f43e1dc4853...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Anthropic's RSP is one of the most detailed public industry safety commitments and a key reference for AI governance discussions; it directly influenced voluntary commitments at the 2023 White House AI summit and subsequent frontier lab safety frameworks.

Metadata

Importance: 85/100organizational reportprimary source

Summary

Anthropic's Responsible Scaling Policy (RSP) is a formal commitment outlining how the company will evaluate AI systems for dangerous capabilities and what safety measures must be in place before developing or deploying more powerful models. It establishes AI Safety Levels (ASLs) analogous to biosafety levels, with specific thresholds and required countermeasures for each level. Version 2.2 represents an iterative update to this framework as Anthropic's models advance.

Key Points

•Defines AI Safety Levels (ASL-1 through ASL-4+) with specific capability thresholds that trigger mandatory safety protocols before further development or deployment.
•Requires regular 'frontier model evaluations' to assess whether models cross capability thresholds for CBRN weapons uplift, autonomous replication, or undermining oversight.
•Establishes binding commitments: if a model reaches an ASL threshold without required safeguards in place, Anthropic must pause deployment or development.
•Covers both deployment restrictions (who can access models and under what conditions) and development restrictions (compute, training runs, security requirements).
•Serves as a public accountability mechanism, intended to be updated as understanding of AI risks evolves and as models become more capable.

Cited by 1 page

Page	Type	Quality
Anthropic Core Views	Safety Agenda	62.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202660 KB

ANTHROP\\C

# ‭Responsible ‭Scaling Policy

‭Version 2.2

‭Effective May 14, 2025

# ‭Executive Summary

‭In September 2023, we released our Responsible Scaling Policy (RSP), a public commitment not to train or ‭deploy models capable of causing catastrophic harm unless we have implemented safety and security ‭measures that will keep risks below acceptable levels. We are now updating our RSP to account for the lessons‬ ‭we’ve learned over the last year. This updated policy reflects our view that risk governance in this rapidly ‭evolving domain should be proportional, iterative, and exportable.‬

‭Background.‬‭AI Safety Level Standards (ASL Standards)‬‭are a set of technical and operational measures for ‭safely training and deploying frontier AI models. These currently fall into two categories: Deployment‬ ‭Standards and Security Standards. As model capabilities increase, so will the need for stronger safeguards, ‭which are captured in successively higher ASL Standards. At present, all of our models must meet the ASL-2‬ ‭Deployment and Security Standards. To determine when a model has become sufficiently advanced such that‬ ‭its deployment and security measures should be strengthened, we use the concepts of Capability Thresholds ‭and Required Safeguards. A Capability Threshold tells us‬‭when‬‭we need to upgrade our protections, and the‬ ‭corresponding Required Safeguards tell us‬‭what standard‬‭should apply.

‭Capability Thresholds and Required Safeguards.‬‭The‬‭Required Safeguards for each Capability Threshold are‬ ‭intended to mitigate risk to acceptable levels. This update to our RSP provides specifications for Capabilities ‭Thresholds related to Chemical, Biological, Radiological, and Nuclear (CBRN) weapons and Autonomous AI‬ ‭Research and Development (AI R&D) and identifies the corresponding Required Safeguards.

‭Capability assessment.‬‭We will routinely test models‬‭to determine whether their capabilities fall sufficiently ‭far below the Capability Thresholds such that the ASL-2 Standard remains appropriate. We will first conduct‬ ‭preliminary assessments to determine whether a more comprehensive evaluation is needed. For models‬ ‭requiring comprehensive testing, we will assess whether the model is unlikely to reach any relevant Capability ‭Thresholds absent surprising advances in widely accessible post-training enhancements. If, after the‬ ‭comprehensive testing, we determine that the model is sufficiently below the relevant Capability Thresholds, ‭then we will continue to apply the ASL-2 Standard. If, however, we are unable to make the required showing,‬ ‭we will act as though the model has surpassed the Capability Threshold. This means that we will both upgrade‬ ‭to the ASL-3 Required Safeguards and conduct a follow-up capability assessment to confirm that the ASL-4 ‭Standard is not necessary.

‭Safeguards assessment.‬‭To determine whether the measures‬‭we have adopted satisfy the ASL-3 Required ‭Safeguards, we will conduct a safeguards a

... (truncated, 60 KB total)

Resource ID: 7ccf80f6837a972a | Stable ID: sid_yY9qu9AnHU