I read every major AI lab’s safety plan so you don’t have to

web

2024·EA Forum·forum.effectivealtruism.org/posts/fsxQGjhYecDoHshxX/i-rea...

Author

sarahhw

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: EA Forum

Written by an AI Safety Fundamentals governance course participant; provides an accessible comparative summary of the three most prominent frontier lab safety frameworks as of 2023-2024, useful as a starting point for governance researchers.

Forum Post Details

Karma

Comments

Forum

eaforum

Forum Tags

AI safetyExistential riskAI evaluations and standardsAI governanceAnthropicDeepMindOpenAIPublic communication on AI safety

Metadata

Importance: 62/100blog postanalysis

Summary

A comparative analysis of safety frameworks from OpenAI, Anthropic, and Google DeepMind, breaking down how each defines risk thresholds, capability evaluations, mitigations, and deployment standards. The post critically examines whether these frameworks constitute genuine safety plans or largely voluntary commitments, and whether they contain sufficient enforcement mechanisms to prevent deployment of dangerous systems.

Key Points

•Compares OpenAI's Preparedness Framework, Anthropic's Responsible Scaling Policy, and Google DeepMind's Frontier Safety Framework side-by-side on key dimensions.
•Highlights how each lab defines 'acceptable risk' and capability thresholds that would trigger pauses or additional mitigations before deployment.
•Raises concern that frameworks are largely self-imposed and voluntary, lacking external enforcement mechanisms to ensure compliance.
•Questions whether current evaluation methodologies are robust enough to reliably detect dangerous capabilities before deployment.
•Useful accessible overview for governance researchers and practitioners wanting a quick comparative understanding of leading lab safety policies.

Cited by 1 page

Page	Type	Quality
Intervention Timing Windows	Analysis	72.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202624 KB

# I read every major AI lab’s safety plan so you don’t have to
By sarahhw
Published: 2024-12-16
*I recently completed the AI Safety Fundamentals governance course. For my project, which won runner-up in the Technical Governance Explainer category, I summarised the safety frameworks published by OpenAI, Anthropic and Deepmind, and offered some of my high-level thoughts. Posting it here in case it can be of any use to people!*

A handful of tech companies are competing to build advanced, general-purpose AI systems that radically outsmart all of humanity. Each acknowledges that this will be a highly – perhaps [existentially](https://www.safe.ai/work/statement-on-ai-risk) – dangerous undertaking. How do they plan to mitigate these risks?

Three industry leaders have released safety frameworks outlining how they intend to avoid catastrophic outcomes. They are OpenAI’s [Preparedness Framework](https://cdn.openai.com/openai-preparedness-framework-beta.pdf), Anthropic’s [Responsible Scaling Policy](https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf) and Google DeepMind’s [Frontier Safety Framework](https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/introducing-the-frontier-safety-framework/fsf-technical-report.pdf).

Despite having been an avid follower of AI safety issues for almost two years now, and having heard plenty about these safety frameworks and how promising (or disappointing) others believe them to be, I had never actually read them in full. I decided to do that – and to create a simple summary that might be useful for others.

I tried to write this assuming no prior knowledge. It is aimed at a reader who has heard that AI companies are doing something dangerous, and would like to know how they plan to address that. In the first section, I give a high-level summary of what each framework actually says. In the second, I offer some of my own opinions.

Note I haven’t covered every aspect of the three frameworks here. I’ve focused on **risk thresholds**, **capability evaluations** and **mitigations**. There are some other sections, which mainly cover each lab’s governance and transparency policies. I also want to throw in the obvious disclaimer that I have not been comprehensive here and have probably missed some nuances despite my best efforts to capture all the important bits!

What are they?
--------------

First, let’s take a look at how each lab defines their safety framework, and what they claim it will achieve.

Antrophic’s Responsible Scaling Policy is the most concretely defined of the three:

*“a public commitment not to train or deploy models capable of causing catastrophic harm unless we have implemented safety and security measures that will keep risks below acceptable levels.”*

OpenAI’s Preparedness Framework calls itself:

“*a living document describing OpenAI’s processes to track, evaluate, forecast, and protect against catastrophic risks posed by increasingl

... (truncated, 24 KB total)

Resource ID: d564401cd5e38340 | Stable ID: sid_lgjWmNjLhP