Skip to content
Longterm Wiki
Back

I read every major AI lab’s safety plan so you don’t have to

web

Author

sarahhw

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: EA Forum

Written by an AI Safety Fundamentals governance course participant; provides an accessible comparative summary of the three most prominent frontier lab safety frameworks as of 2023-2024, useful as a starting point for governance researchers.

Forum Post Details

Karma
68
Comments
2
Forum
eaforum
Forum Tags
AI safetyExistential riskAI evaluations and standardsAI governanceAnthropicDeepMindOpenAIPublic communication on AI safety

Metadata

Importance: 62/100blog postanalysis

Summary

A comparative analysis of safety frameworks from OpenAI, Anthropic, and Google DeepMind, breaking down how each defines risk thresholds, capability evaluations, mitigations, and deployment standards. The post critically examines whether these frameworks constitute genuine safety plans or largely voluntary commitments, and whether they contain sufficient enforcement mechanisms to prevent deployment of dangerous systems.

Key Points

  • Compares OpenAI's Preparedness Framework, Anthropic's Responsible Scaling Policy, and Google DeepMind's Frontier Safety Framework side-by-side on key dimensions.
  • Highlights how each lab defines 'acceptable risk' and capability thresholds that would trigger pauses or additional mitigations before deployment.
  • Raises concern that frameworks are largely self-imposed and voluntary, lacking external enforcement mechanisms to ensure compliance.
  • Questions whether current evaluation methodologies are robust enough to reliably detect dangerous capabilities before deployment.
  • Useful accessible overview for governance researchers and practitioners wanting a quick comparative understanding of leading lab safety policies.

Cited by 1 page

PageTypeQuality
Intervention Timing WindowsAnalysis72.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202624 KB
I read every major AI lab’s safety plan so you don’t have to — EA Forum 
 
 This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. Hide table of contents I read every major AI lab’s safety plan so you don’t have to 

 by sarahhw Dec 16 2024 14 min read 2 68

 AI safety Existential risk AI evaluations and standards AI governance Anthropic DeepMind OpenAI Public communication on AI safety Frontpage I read every major AI lab’s safety plan so you don’t have to What are they? Thresholds & triggers Capability thresholds Risk categories Evaluations Mitigations Security standards OpenAI Anthropic DeepMind Deployment standards OpenAI Anthropic Deepmind Development standards OpenAI Anthropic DeepMind My thoughts & open questions These are not ‘plans’ What are acceptable levels of risk? The get-out-of-RSP-free card 2 comments This is a linkpost for https://longerramblings.substack.com/p/i-read-every-major-ai-labs-safety I recently completed the AI Safety Fundamentals governance course. For my project, which won runner-up in the Technical Governance Explainer category, I summarised the safety frameworks published by OpenAI, Anthropic and Deepmind, and offered some of my high-level thoughts. Posting it here in case it can be of any use to people! 

 A handful of tech companies are competing to build advanced, general-purpose AI systems that radically outsmart all of humanity. Each acknowledges that this will be a highly – perhaps existentially – dangerous undertaking. How do they plan to mitigate these risks?

 Three industry leaders have released safety frameworks outlining how they intend to avoid catastrophic outcomes. They are OpenAI’s Preparedness Framework , Anthropic’s Responsible Scaling Policy and Google DeepMind’s Frontier Safety Framework .

 Despite having been an avid follower of AI safety issues for almost two years now, and having heard plenty about these safety frameworks and how promising (or disappointing) others believe them to be, I had never actually read them in full. I decided to do that – and to create a simple summary that might be useful for others.

 I tried to write this assuming no prior knowledge. It is aimed at a reader who has heard that AI companies are doing something dangerous, and would like to know how they plan to address that. In the first section, I give a high-level summary of what each framework actually says. In the second, I offer some of my own opinions.

 Note I haven’t covered every aspect of the three frameworks here. I’ve focused on risk thresholds , capability evaluations and mitigations . There are some other sections, which mainly cover each lab’s governance and transparency policies. I also want to throw in the obvious disclaimer that I have not been comprehensive here and have probably missed some nuances despite my best efforts to capture all the important bits!

 What are they?

 First, let’s take a look at how each lab defines their

... (truncated, 24 KB total)
Resource ID: d564401cd5e38340 | Stable ID: ZDZhYTk5Nj