Is Power-Seeking AI an Existential Risk?

paper

2022·arXiv·arxiv.org/abs/2206.13353

Author

Joseph Carlsmith

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Examines the core argument that power-seeking behavior in advanced AI systems poses existential risks, analyzing how misaligned superintelligent agents would have instrumental incentives to gain control over humans.

Paper Details

Citations

12 influential

Year

2025

Methodology

book-chapter

Metadata

arxiv preprintprimary source

Abstract

This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070.

Summary

This report examines the core argument for existential risk from misaligned AI by presenting two main components: first, a backdrop picture establishing that intelligent agency is an extremely powerful force and that creating superintelligent agents poses significant risks, particularly because misaligned agents would have instrumental incentives to seek power over humans; second, a detailed six-premise argument evaluating whether creating such agents would lead to existential catastrophe by 2070. The work provides a structured analysis of why power-seeking behavior in advanced AI systems represents a fundamental existential concern.

Cited by 5 pages

Page	Type	Quality
AI Acceleration Tradeoff Model	Analysis	50.0
Carlsmith's Six-Premise Argument	Analysis	65.0
Instrumental Convergence	Risk	64.0
Power-Seeking AI	Risk	67.0
AI Doomer Worldview	Concept	38.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202698 KB

[2206.13353] Is Power-Seeking AI an Existential Risk? 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Is Power-Seeking AI an Existential Risk?

 
 
 Joseph Carlsmith
 Open Philanthropy
 April 2021
 
 Video presentation | Slides | Audio version 
 
 
 

 
 Abstract

 This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire – especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe. I assign rough subjective credences to the premises in this argument, and I end up with an overall estimate of ~5% that an existential catastrophe of this kind will occur by 2070. (May 2022 update: since making this report public in April 2021, my estimate here has gone up, and is now at >10%.) 

 

 
 
 1 Introduction

 
 Some worry that the development of advanced artificial intelligence will result in existential catastrophe—that is, the destruction of humanity’s longterm potential. 1 1 1 See e.g. Yudkowsky (2008) , Bostrom (2014) , Hawking (2014) , Tegmark (2017) , Christiano (2019) , Russell (2019) , Ord (2020) , and Ngo (2020) . This definition of “existential catastrophe” is from Ord (2020, p. 27) ; see section 7 for a bit more discussion. Here I examine the following version of this worry (it’s not the only version):

 
 
 By 2070:

 
 
 
 
 1. 
 
 It will become possible and financially feasible to build AI systems with the following properties:

 
 
 
 
 • 
 
 Advanced capability : they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation).

 

 
 • 
 
 Agentic planning : they make and execute plans, in pursuit of objectives, on the basis of models of the world.

 

 
 • 
 
 Strategic awareness: the models they use in making plans represent with reasonable acc

... (truncated, 98 KB total)

Resource ID: 6e597a4dc1f6f860 | Stable ID: sid_4fU89O3GZN