Skip to content
Longterm Wiki

Author

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: EA Forum

This is Anthropic's updated voluntary safety commitment framework (v3.1), part of an ongoing series; earlier versions (v1, v2) introduced the RSP concept and ASL framework that has influenced frontier AI governance discussions and similar policies at other labs like DeepMind and OpenAI.

Forum Post Details

Karma
134
Comments
8
Forum
eaforum
Forum Tags
AI safetyAI evaluations and standardsAI governanceAI raceRisk assessment

Metadata

Importance: 72/100primary source

Summary

Anthropic's third iteration of its Responsible Scaling Policy (RSP) outlines commitments to evaluate AI models for dangerous capabilities and adjust deployment and development practices based on defined AI Safety Levels (ASLs). The policy establishes thresholds at which Anthropic would pause or constrain scaling based on capability evaluations, particularly around CBRN risks and autonomy. It represents a voluntary industry framework for conditional scaling commitments tied to safety evaluations.

Key Points

  • Defines AI Safety Levels (ASL-1 through ASL-4+) with specific capability thresholds that trigger enhanced safety requirements or development pauses.
  • Requires pre-deployment evaluations for dangerous capabilities including CBRN uplift, cyberoffense, and autonomous replication/self-exfiltration.
  • Establishes commitments to develop and validate safety measures before crossing capability thresholds, rather than after deployment.
  • Introduces strengthened accountability mechanisms compared to v2, including board-level oversight and external auditing provisions.
  • Serves as a model for voluntary safety commitments in the AI industry, influencing similar policies at other frontier labs.

Cached Content Preview

HTTP 200Fetched Apr 7, 202666 KB
# Responsible Scaling Policy v3
By Holden Karnofsky
Published: 2026-02-24
*All views are my own, not Anthropic’s. This post assumes Anthropic’s announcement of RSP v3.0 as background.*

Today, Anthropic released its Responsible Scaling Policy 3.0. The official [announcement](https://anthropic.com/news/responsible-scaling-policy-v3) discusses the high-level thinking behind it. This is a more detailed post giving my own takes on the update.

First, the big picture:

*   I expect some people will be upset about the move away from a “hard commitments”/”binding ourselves to the mast” vibe. (Anthropic has always had the ability to revise the RSP, and we’ve always had language in there specifically flagging that we might revise away key commitments in a situation where other AI developers aren’t adhering to similar commitments. But it’s been easy to get the impression that the RSP is “binding ourselves to the mast” and committing to unilaterally pause AI development and deployment under some conditions, and Anthropic is responsible for that.)
*   I take significant responsibility for this change. I have been pushing for this change for about a year now, and have led the way in developing the new RSP. I am in favor of nearly everything about the changes we’re making. I am excited about the Roadmap, the Risk Reports, the move toward external review, and unwinding of some of the old requirements that I felt were distorting our safety efforts (more on all of this below).
*   I think these changes are the right thing for reducing AI risk, both from Anthropic and from other companies if they make similar changes (as I hope they do).
*   In my mind, this revision isn’t being prompted by “catastrophic risk from today’s AI systems is now high” (I don’t think it is), or by “We’ve just realized that sufficient regulation isn’t looking likely” (I think this is not a very recent update). First and foremost, in my mind, it is about learning from design flaws and making improvements.
*   I always thought of the original RSP as a “v1” that would be iterated on, and have been frustrated to see the extent to which it’s been interpreted as a “sacred cow” or “binding oneself to the mast” such that revisions that go in a “less self-binding” direction are seen by many as inherently dishonorable. I would’ve pushed for a very different (and far less ambitious) initial design if I’d thought this way about future changes.
    *   I generally think it’s bad to create an environment that encourages people to be afraid of making mistakes, afraid of admitting mistakes and reticent to change things that aren’t working. I think that dynamic currently applies somewhat to RSP-like policies, and I hope that changes.
    *   I’m not saying that I wish people shrugged off every revision with “Hey, it’s your policy, do what you want.” I wish people simply evaluated whether the changes seem good on the merits, without starting from a strong presumption that the mere fact of changes is either

... (truncated, 66 KB total)
Resource ID: 0ec6752232c23a32 | Stable ID: sid_Oes1m1LGwe