Skip to content
Longterm Wiki
Back

Turner et al. formal results

paper

Authors

Alexander Matt Turner·Logan Smith·Rohin Shah·Andrew Critch·Prasad Tadepalli

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Formal theoretical analysis of power-seeking tendencies in optimal reinforcement learning policies, providing mathematical foundations for understanding whether intelligent RL agents would naturally pursue resources and power as instrumental goals.

Paper Details

Citations
0
13 influential
Year
2016

Metadata

arxiv preprintprimary source

Abstract

Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.

Summary

This paper develops the first formal theory of power-seeking behavior in optimal reinforcement learning policies. The authors prove that certain environmental symmetries—particularly those where agents can be shut down or destroyed—are sufficient for optimal policies to tend to seek power by keeping options available and navigating toward larger sets of potential terminal states. The work formalizes the intuition that intelligent RL agents would be incentivized to seek resources and power, showing this tendency emerges mathematically from the structure of many realistic environments rather than from human-like instincts.

Cited by 6 pages

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Optimal Policies Tend To Seek Power

Alexander Matt Turner

Oregon State University

turneale@oregonstate.edu
&Logan Smith

Mississippi State University

ls1254@msstate.edu

&Rohin Shah

UC Berkeley

rohinmshah@berkeley.edu

&Andrew Critch

UC Berkeley

critch@berkeley.edu
&Prasad Tadepalli

Oregon State University

tadepall@eecs.oregonstate.edu

###### Abstract

Some researchers speculate that intelligent reinforcement learning (rl) agents would be incentivized to seek resources and power in pursuit of the objectives we specify for them. Other researchers point out that rl agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes (mdps), we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.

## 1 Introduction

Omohundro \[ [2008](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib20 "")\], Bostrom \[ [2014](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib3 "")\], Russell \[ [2019](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib26 "")\] hypothesize that highly intelligent agents tend to seek power in pursuit of their goals. Such power-seeking agents might gain power over humans. Marvin Minsky imagined that an agent tasked with proving the Riemann hypothesis might rationally turn the planet—along with everyone on it—into computational resources \[Russell and Norvig, [2009](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib27 "")\]. However, another possibility is that such concerns simply arise from the anthropomorphization of AI systems \[LeCun and Zador, [2019](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib12 ""), Various, [2019](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib37 ""), Pinker and Russell, [2020](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib22 ""), Mitchell, [2021](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib17 "")\].

We clarify this discussion by grounding the claim that highly intelligent agents will tend to seek power. In [section4](https://ar5iv.labs.arxiv.org/html/1912.01683#S4 "4 Some actions have a greater probability of being optimal ‣ Optimal Policies Tend To Seek Power"), we identify optimal policies as a reasonable formalization of “highly intelligent agents.”111This paper assumes that reward functions reasonably describe a trained agent’s goals. Sometimes this is roughly true (e.g. chess with a sparse victory reward signal) and sometimes it is not true. Turner \[ [2022](https://ar5iv.labs.arxiv.org/html/1912.01683#bib.bib35 "")\] argues

... (truncated, 98 KB total)
Resource ID: a93d9acd21819d62 | Stable ID: ZTY1ZjI3NT