Turner et al. formal results
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Formal theoretical analysis of power-seeking tendencies in optimal reinforcement learning policies, providing mathematical foundations for understanding whether intelligent RL agents would naturally pursue resources and power as instrumental goals.
Paper Details
Metadata
Abstract
Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.
Summary
This paper develops the first formal theory of power-seeking behavior in optimal reinforcement learning policies. The authors prove that certain environmental symmetries—particularly those where agents can be shut down or destroyed—are sufficient for optimal policies to tend to seek power by keeping options available and navigating toward larger sets of potential terminal states. The work formalizes the intuition that intelligent RL agents would be incentivized to seek resources and power, showing this tendency emerges mathematically from the structure of many realistic environments rather than from human-like instincts.
Cited by 6 pages
| Page | Type | Quality |
|---|---|---|
| AI Accident Risk Cruxes | Crux | 67.0 |
| The Case For AI Existential Risk | Argument | 66.0 |
| Instrumental Convergence Framework | Analysis | 60.0 |
| Corrigibility | Research Area | 59.0 |
| Instrumental Convergence | Risk | 64.0 |
| Power-Seeking AI | Risk | 67.0 |
Cached Content Preview
# Optimal Policies Tend To Seek Power
Alexander Matt Turner
Oregon State University
turneale@oregonstate.edu
&Logan Smith
Mississippi State University
ls1254@msstate.edu
&Rohin Shah
UC Berkeley
rohinmshah@berkeley.edu
&Andrew Critch
UC Berkeley
critch@berkeley.edu
&Prasad Tadepalli
Oregon State University
tadepall@eecs.oregonstate.edu
###### Abstract
Some researchers speculate that intelligent reinforcement learning (rl) agents would be incentivized to seek resources and power in pursuit of the objectives we specify for them. Other researchers point out that rl agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes (mdps), we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.
## 1 Introduction
Omohundro \[ [2008](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib20 "")\], Bostrom \[ [2014](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib3 "")\], Russell \[ [2019](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib26 "")\] hypothesize that highly intelligent agents tend to seek power in pursuit of their goals. Such power-seeking agents might gain power over humans. Marvin Minsky imagined that an agent tasked with proving the Riemann hypothesis might rationally turn the planet—along with everyone on it—into computational resources \[Russell and Norvig, [2009](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib27 "")\]. However, another possibility is that such concerns simply arise from the anthropomorphization of AI systems \[LeCun and Zador, [2019](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib12 ""), Various, [2019](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib37 ""), Pinker and Russell, [2020](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib22 ""), Mitchell, [2021](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib17 "")\].
We clarify this discussion by grounding the claim that highly intelligent agents will tend to seek power. In [section4](https://ar5iv.labs.arxiv.org/html/1912.01683#S4 "4 Some actions have a greater probability of being optimal ‣ Optimal Policies Tend To Seek Power"), we identify optimal policies as a reasonable formalization of “highly intelligent agents.”111This paper assumes that reward functions reasonably describe a trained agent’s goals. Sometimes this is roughly true (e.g. chess with a sparse victory reward signal) and sometimes it is not true. Turner \[ [2022](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib35 "")\] argues
... (truncated, 98 KB total)a93d9acd21819d62 | Stable ID: ZTY1ZjI3NT