Can Preparedness Frameworks Pull Their Weight?

web

fas.org·fas.org/publication/scaling-ai-safety/

Published by the Federation of American Scientists, this policy-oriented analysis is relevant for those tracking how AI labs' internal safety frameworks (e.g., OpenAI's Preparedness Framework) are being scrutinized and whether external governance mechanisms are needed to ensure accountability at scale.

Metadata

Importance: 62/100organizational reportanalysis

Summary

This Federation of American Scientists publication examines whether current AI preparedness frameworks—such as those adopted by major AI labs—are adequate for managing risks as AI systems scale. It analyzes the strengths and limitations of existing evaluation and red-teaming approaches and offers policy recommendations for more robust safety infrastructure.

Key Points

•Evaluates whether voluntary preparedness frameworks from AI labs are sufficient to address risks from increasingly capable AI systems.
•Identifies gaps in current capability assessments and red-teaming methodologies that may leave critical risks undetected.
•Argues that scaling AI capabilities requires commensurately scaling safety evaluation rigor and institutional oversight.
•Offers policy recommendations for government and industry to strengthen pre-deployment safety requirements.
•Highlights coordination challenges between labs, regulators, and policymakers in establishing enforceable safety standards.

Cited by 2 pages

Page	Type	Quality
AI Evaluations	Research Area	72.0
AI Governance & Policy (Overview)	--	84.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202638 KB

Can Preparedness Frameworks Pull Their Weight? 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 

 
 
 
 
 
 
 
 
 

 
 

 

 
 

 

 
 A new class of risk mitigation policies has recently come into vogue for frontier AI developers. Known alternately as Responsible Scaling Policies or Preparedness Frameworks, these policies outline commitments to risk mitigations that developers of the most advanced AI models will implement as their models display increasingly risky capabilities. While the idea for these policies is less than a year old, already two of the most advanced AI developers, Anthropic and OpenAI, have published initial versions of these policies. The U.K. AI Safety Institute asked frontier AI developers about their “Responsible Capability Scaling” policies ahead of the November 2023 UK AI Safety Summit . It seems that these policies are here to stay.

 The National Institute of Standards & Technology (NIST) recently sought public input on its assignments regarding generative AI risk management, AI evaluation, and red-teaming. The Federation of American Scientists was happy to provide input; this is the full text of our response . NIST’s request for information (RFI) highlighted several potential risks and impacts of potentially dual-use foundation models, including: “Negative effects of system interaction and tool use…chemical, biological, radiological, and nuclear (CBRN) risks…[e]nhancing or otherwise affecting malign cyber actors’ capabilities…[and i]mpacts to individuals and society.” This RFI presented a good opportunity for us to discuss the benefits and drawbacks of these new risk mitigation policies.

 This report will provide some background on this class of risk mitigation policies (we use the term Preparedness Framework, for reasons to be described below). We outline suggested criteria for robust Preparedness Frameworks (PFs) and evaluate two key documents, Anthropic’s Responsible Scaling Policy and OpenAI’s Preparedness Framework , against these criteria. We claim that these policies are net-positive and should be encouraged. At the same time, we identify shortcomings of current PFs, chiefly that they are underspecified, insufficiently conservative, and address structural risks poorly. Improvement in the state of the art of risk evaluation for frontier AI models is a prerequisite for a meaningfully binding PF. Most importantly, PFs, as unilateral commitments by private actors, cannot replace public policy.

 Motivation for Preparedness Frameworks

 As AI labs develop potentially dual-use foundation models (as defined by Executive Order No. 14110 , the “AI EO”) with capability, compute, and efficiency improvements, novel risks may emerge , some of them potentially catastrophic. Today’s foundation models can already cause harm and pose some risks, especially as they are more broadly used. Advanced large language models at times display unpredictable behaviors . 

 To this point, these harms have not risen to the level of posing ca

... (truncated, 38 KB total)

Resource ID: bf534eeba9c14113 | Stable ID: sid_62gYDFM1pU