Anthropic's Responsible Scaling Policy: Version 3.0
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic
Anthropic's RSP v3.0 is a major voluntary safety framework update defining conditional commitments and AI Safety Levels (ASLs) to mitigate catastrophic risks as AI capabilities scale, directly shaping industry norms and policy discussions.
Metadata
Summary
Anthropic releases the third version of its Responsible Scaling Policy (RSP), a voluntary framework using conditional 'if-then' commitments tied to AI Safety Levels (ASLs) to manage catastrophic risks from increasingly capable AI systems. The update reinforces what has worked, addresses shortcomings identified over two years of operation, and introduces new transparency and accountability measures. The RSP aims to serve as an internal forcing function, encourage industry-wide 'race to the top' dynamics, and build consensus around capability thresholds requiring multilateral action.
Key Points
- •RSP v3.0 uses conditional commitments: if a model exceeds certain capability thresholds, stricter safeguards (ASLs) are required before training or deployment.
- •AI Safety Levels (ASL-2, ASL-3, ASL-4+) define progressively stringent safeguard requirements tied to model capability levels.
- •The policy aims to act as an internal forcing function within Anthropic, making safety safeguards mandatory launch requirements.
- •Anthropic hopes the RSP encourages a 'race to the top' across the AI industry, potentially informing voluntary standards or legislation.
- •Capability thresholds (e.g., bioweapon-relevant capabilities) are intended as trigger points for both unilateral and multilateral safety action.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Frontier AI Labs (Overview) | -- | 85.0 |
| AI Governance & Policy (Overview) | -- | 84.0 |
Cached Content Preview
Policy Announcements Anthropic’s Responsible Scaling Policy: Version 3.0
Feb 24, 2026 Read the Responsible Scaling Policy We’re releasing the third version of our Responsible Scaling Policy (RSP), the voluntary framework we use to mitigate catastrophic risks from AI systems.
Anthropic has now had an RSP for more than two years, and we’ve learned a great deal about its benefits and its shortcomings. We’re therefore updating the policy to reinforce what has worked well to date, improve the policy where necessary, and implement new measures to increase the transparency and accountability of our decision-making.
You can read the new RSP in full here . In this post, we’ll discuss some of the thinking behind the changes.
The original RSP and our theory of change
The RSP is our attempt to solve the problem of how to address AI risks that are not present at the time the policy is written, but which could emerge rapidly as a result of an exponentially advancing technology. When we wrote the original RSP in September 2023, large language models were essentially chat interfaces. Today they can browse the web, write and run code, use computers, and take autonomous, multi-step actions. As each of these new capabilities have emerged, so have new risks. We expect this pattern to continue.
We focused the RSP on the principle of conditional , or if-then , commitments. If a model exceeded certain capability levels (for example, biological science capabilities that could assist in the creation of dangerous weapons), then the policy stated that we should introduce a new and stricter set of safeguards (for example, against model misuse and the theft of model weights).
Each set of safeguards corresponded to an “AI Safety Level” (ASL): for example, ASL-2 referred to one set of required safeguards, whereas ASL-3 referred to a more stringent set of safeguards needed for more capable AI models.
Early ASLs (ASL-2 and ASL-3) were defined in significant detail, but it was more difficult to specify the correct safeguards for models that were still several generations away. We therefore intentionally left the later ASLs (ASL-4 and beyond) largely undefined, and hoped to develop them in more detail once we had a better picture of what higher AI capability levels would entail.
The following is a rough description of our “theory of change”—that is, the mechanisms whereby we hoped to affect the ecosystem with the RSP:
An internal forcing function. Within Anthropic, we hoped the RSP would compel us to treat important safeguards as requirements for launching (and training) new models. This made the importance of these safeguards clear to the large and growing organization, spurring us on to make faster progress.
A race to the top. We hoped that announcing our RSP would encourage other AI companies to introduce similar policies. This is the idea of a “race to the top” (the converse of a “race to the bottom”), in which different industry players are incentivized t
... (truncated, 15 KB total)0a9c389fb3e8f4ae | Stable ID: sid_1ra7R72bzw