Carnegie Endowment: If-Then Commitments for AI Risk Reduction

web

Carnegie Endowment·carnegieendowment.org/research/2024/09/if-then-commitment...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Carnegie Endowment

Published by Carnegie Endowment in September 2024, this policy analysis is relevant for understanding conditional governance mechanisms that could operationalize AI safety commitments made by frontier labs and governments, complementing frameworks like the Bletchley Declaration and voluntary safety pledges.

Metadata

Importance: 62/100organizational reportanalysis

Summary

This Carnegie Endowment report proposes 'if-then' conditional commitment frameworks for AI governance, where AI developers and governments pre-commit to specific risk-mitigation actions triggered by defined capability thresholds or adverse events. The approach aims to reduce AI risks by creating credible, enforceable pledges that activate before harms materialize. It bridges voluntary industry commitments and binding regulation.

Key Points

•Proposes conditional 'if-then' commitments where specific AI risk triggers automatically require predefined mitigation responses from developers or governments.
•Addresses the gap between voluntary AI safety pledges (often vague) and hard regulation (slow to enact), offering a middle-ground mechanism.
•Commitments could cover scenarios like capability thresholds, dangerous evaluations results, or incident reports triggering deployment pauses or third-party audits.
•Framework emphasizes credibility and verifiability, requiring commitments to be specific, measurable, and subject to oversight.
•Relevant to ongoing debates around frontier AI governance, safety frameworks like those from major labs, and international coordination efforts.

Cited by 1 page

Page	Type	Quality
AI Policy Effectiveness	Analysis	64.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202673 KB

If-Then Commitments for AI Risk Reduction | Carnegie Endowment for International Peace Source : Getty

 Paper If-Then Commitments for AI Risk Reduction

 If-then commitments are an emerging framework for preparing for risks from AI without unnecessarily slowing the development of new technology. The more attention and interest there is in these commitments, the faster a mature framework can progress.

 English Link Copied By Holden Karnofsky Published on Sep 13, 2024 Additional Links

 Full Text (PDF) Program 

 Carnegie California

 Carnegie California links developments in California and the West Coast with national and global conversations around technology, democracy, and trans-Pacific relationships. At a distance from national capitals, and located in one of the world’s great experiments in pluralist democracy, Carnegie California engages a wide array of stakeholders as partners in its research and policy engagement.

 

 Learn More Introduction

 Artificial intelligence (AI) could pose a variety of catastrophic risks to international security in several domains, including the proliferation and acceleration of cyberoffense capabilities, and of the ability to develop chemical or biological weapons of mass destruction. Even the most powerful AI models today are not yet capable enough to pose such risks, 1 but the coming years could see fast and hard-to-predict changes in AI capabilities. Both companies and governments have shown significant interest in finding ways to prepare for such risks without unnecessarily slowing the development of new technology.

 This piece is a primer on an emerging framework for handling this challenge: if-then commitments. These are commitments of the form: If an AI model has capability X, risk mitigations Y must be in place. And, if needed, we will delay AI deployment and/or development to ensure the mitigations can be present in time. A specific example: If an AI model has the ability to walk a novice through constructing a weapon of mass destruction, we must ensure that there are no easy ways for consumers to elicit behavior in this category from the AI model. 

 If-then commitments can be voluntarily adopted by AI developers; they also, potentially, can be enforced by regulators. Adoption of if-then commitments could help reduce risks from AI in two key ways: (a) prototyping, battle-testing, and building consensus around a potential framework for regulation; and (b) helping AI developers and others build roadmaps of what risk mitigations need to be in place by when. Such adoption does not require agreement on whether major AI risks are imminent—a polarized topic—only that certain situations would require certain risk mitigations if they came to pass.

 Three industry leaders— Google DeepMind , OpenAI , and Anthropic —have published relatively detailed frameworks along these lines. Sixteen companies have announced their intention to establish frameworks in a similar spirit by the time of the upcoming 2025 AI Ac

... (truncated, 73 KB total)

Resource ID: 80b0765e1dfc4afd | Stable ID: sid_3GhOsJaftn