Skip to content
Longterm Wiki
Back

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

This is Anthropic's foundational policy document establishing how it gates deployment of increasingly capable models; a key reference for understanding industry-led AI governance frameworks and voluntary safety commitments.

Metadata

Importance: 85/100policy briefprimary source

Summary

Anthropic introduces its Responsible Scaling Policy (RSP), a framework of technical and organizational protocols for managing catastrophic risks as AI systems become more capable. The policy defines AI Safety Levels (ASL-1 through ASL-5+), modeled after biosafety level standards, requiring increasingly strict safety, security, and operational measures tied to a model's potential for catastrophic risk. Current Claude models are classified ASL-2, with ASL-3 and beyond triggering stricter deployment and security requirements.

Key Points

  • Defines AI Safety Levels (ASL) framework: ASL-1 (no risk) through ASL-5+ (not yet defined), with each level triggering stricter safety and security requirements.
  • ASL-2 covers current LLMs including Claude; ASL-3 applies when models substantially increase catastrophic misuse risk or show autonomous capabilities.
  • Commits to temporarily pausing training of more powerful models if safety measures for the next ASL level are not yet in place.
  • ASL-3 requires passing adversarial red-teaming by world-class teams before deployment, not just performing red-teaming.
  • ASL-4 measures may require unsolved research like interpretability-based mechanistic assurances of safe behavior.

Cited by 12 pages

Cached Content Preview

HTTP 200Fetched Mar 20, 20267 KB
Announcements

# Anthropic's Responsible Scaling Policy

Sep 19, 2023

![](https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F6276c7f8e14b693c66836810242243bd8dfd03ce-2880x1620.png&w=3840&q=75)

Today, we’re publishing our [Responsible Scaling Policy (RSP)](https://anthropic.com/responsible-scaling-policy) – a series of technical and organizational protocols that we’re adopting to help us manage the risks of developing increasingly capable AI systems.

As AI models become more capable, we believe that they will create major economic and social value, but will also present increasingly severe risks. Our RSP focuses on catastrophic risks – those where an AI model directly causes large scale devastation. Such risks can come from deliberate misuse of models (for example use by terrorists or state actors to create bioweapons) or from models that cause destruction by acting autonomously in ways contrary to the intent of their designers.

![](https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fc9812176a54de4258c4969b24bf55dd4dfc1d928-5760x3240.png&w=3840&q=75)

Our RSP defines a framework called AI Safety Levels (ASL) for addressing catastrophic risks, modeled loosely after the US government’s biosafety level (BSL) standards for handling of dangerous biological materials. The basic idea is to require safety, security, and operational standards appropriate to a model’s potential for catastrophic risk, with higher ASL levels requiring increasingly strict demonstrations of safety.

A very abbreviated summary of the ASL system is as follows:

- ASL-1 refers to systems which pose no meaningful catastrophic risk, for example a 2018 LLM or an AI system that only plays chess.
- ASL-2 refers to systems that show early signs of dangerous capabilities – for example ability to give instructions on how to build bioweapons – but where the information is not yet useful due to insufficient reliability or not providing information that e.g. a search engine couldn’t. Current LLMs, including Claude, appear to be ASL-2.
- ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities.
- ASL-4 and higher (ASL-5+) is not yet defined as it is too far from present systems, but will likely involve qualitative escalations in catastrophic misuse potential and autonomy.

The definition, criteria, and safety measures for each ASL level are described in detail in the main document, but at a high level, ASL-2 measures represent our current safety and security standards and overlap significantly with our recent [White House commitments](https://bidenwhitehouse.archives.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage

... (truncated, 7 KB total)
Resource ID: 394ea6d17701b621 | Stable ID: NGU0MjEyN2