Anthropic's Responsible Scaling Policy

blog

Anthropic·anthropic.com/index/anthropics-responsible-scaling-policy

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

A foundational industry policy document from Anthropic establishing concrete, capability-gated safety commitments; widely cited as a leading example of responsible scaling frameworks and has influenced similar policies at other frontier AI labs.

Metadata

Importance: 82/100organizational reportprimary source

Summary

Anthropic's Responsible Scaling Policy (RSP) establishes a framework for safely developing increasingly capable AI systems by tying deployment and training decisions to AI Safety Levels (ASLs). It commits Anthropic to pausing development if safety and security measures cannot keep pace with capability advances, and outlines specific protocols for evaluating dangerous capabilities thresholds.

Key Points

•Introduces AI Safety Levels (ASL-1 through ASL-4+) analogous to biosafety levels, defining risk tiers and corresponding safety requirements.
•Commits Anthropic to halting training or deployment if models approach capability thresholds without adequate safeguards in place.
•Establishes mandatory evaluations for dangerous capabilities (e.g., CBRN uplift, autonomous replication) before and during model deployment.
•Creates accountability mechanisms including required safety cases and board-level oversight tied to capability milestones.
•Serves as a model for responsible scaling commitments in the AI industry, influencing similar policies at other labs.

Cited by 3 pages

Page	Type	Quality
Anthropic	Organization	74.0
Evals-Based Deployment Gates	Approach	66.0
AI Whistleblower Protections	Policy	63.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20266 KB

Announcements Anthropic&#x27;s Responsible Scaling Policy

 Sep 19, 2023 Today, we’re publishing our Responsible Scaling Policy (RSP) – a series of technical and organizational protocols that we’re adopting to help us manage the risks of developing increasingly capable AI systems.

As AI models become more capable, we believe that they will create major economic and social value, but will also present increasingly severe risks. Our RSP focuses on catastrophic risks – those where an AI model directly causes large scale devastation. Such risks can come from deliberate misuse of models (for example use by terrorists or state actors to create bioweapons) or from models that cause destruction by acting autonomously in ways contrary to the intent of their designers.

 

 
Our RSP defines a framework called AI Safety Levels (ASL) for addressing catastrophic risks, modeled loosely after the US government’s biosafety level (BSL) standards for handling of dangerous biological materials. The basic idea is to require safety, security, and operational standards appropriate to a model’s potential for catastrophic risk, with higher ASL levels requiring increasingly strict demonstrations of safety.

 A very abbreviated summary of the ASL system is as follows:

 ASL-1 refers to systems which pose no meaningful catastrophic risk, for example a 2018 LLM or an AI system that only plays chess.
 ASL-2 refers to systems that show early signs of dangerous capabilities – for example ability to give instructions on how to build bioweapons – but where the information is not yet useful due to insufficient reliability or not providing information that e.g. a search engine couldn’t. Current LLMs, including Claude, appear to be ASL-2.
 ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities.
 ASL-4 and higher (ASL-5+) is not yet defined as it is too far from present systems, but will likely involve qualitative escalations in catastrophic misuse potential and autonomy.
 The definition, criteria, and safety measures for each ASL level are described in detail in the main document, but at a high level, ASL-2 measures represent our current safety and security standards and overlap significantly with our recent White House commitments . ASL-3 measures include stricter standards that will require intense research and engineering effort to comply with in time, such as unusually strong security requirements and a commitment not to deploy ASL-3 models if they show any meaningful catastrophic misuse risk under adversarial testing by world-class red-teamers (this is in contrast to merely a commitment to perform red-teaming). Our ASL-4 measures aren’t yet written (our commitment is to write them before we reach ASL-3), but may require methods of assurance that are unsolved research problems today, such as using interpretability methods to demonstrate me

... (truncated, 6 KB total)

Resource ID: c637506d2cd4d849 | Stable ID: sid_rYZq1xuUgV