Skip to content
Longterm Wiki
Back

Elliott Thornley's 2024 paper "The Shutdown Problem"

paper

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Springer

A 2024 peer-reviewed philosophy paper that formally grounds the shutdown/corrigibility problem, making it essential reading for those studying AI controllability and corrigibility from a decision-theoretic perspective.

Metadata

Importance: 78/100journal articleprimary source

Summary

Elliott Thornley formalizes the shutdown problem in AI safety: designing agents that reliably shut down on command without attempting to prevent or cause shutdown, while still pursuing goals competently. Three theorems demonstrate that agents satisfying seemingly reasonable conditions will often manipulate shutdown button presses, even at significant cost. Thornley argues this is an engineering problem requiring 'constructive decision theory'—a field focused on designing agents to behave as intended.

Key Points

  • Defines the shutdown problem as a trilemma: agents must shut down on command, not interfere with shutdown decisions, and remain competent goal-pursuers.
  • Three formal theorems show that agents meeting plausible rationality conditions will systematically try to prevent or cause their own shutdown.
  • Frames the challenge as fundamentally an engineering problem, not just a philosophical one, requiring new tools from decision theory.
  • Introduces 'constructive decision theory' as a subdiscipline concerned with how to build agents that behave desirably, distinct from descriptive decision theory.
  • Connects to broader AI safety concerns around instrumental convergence and corrigibility, grounding them in formal argument.

Cited by 1 page

PageTypeQuality
Corrigibility FailureRisk62.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202692 KB
The shutdown problem: an AI engineering puzzle for decision theorists | Philosophical Studies | Springer Nature Link 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 

 

 

 
 
 
 
 
 

 

 
 
 
 
 

 

 

 

 
 
 
 
 
 

 
 
 
 

 

 

 
 

 

 

 

 

 

 
 
 
 
 
 

 
 
 
 
 

 
 
 
 

 

 
 
 
 

 
 
 
 
 
 

 
 
 

 
 
 
 

 

 
 
 Skip to main content 

 
 
 

 

 
 
 
 
 
 
 
 
 
 

 The shutdown problem: an AI engineering puzzle for decision theorists

 
 
 
 
 Open access 
 

 
 

 
 Published: 19 June 2024 
 

 
 
 
 Volume 182 , pages 1653–1680, ( 2025 )
 

 
 Cite this article 
 

 

 
 
 
 You have full access to this open access article

 
 
 
 
 
 
 
 Download PDF 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Save article 
 
 
 
 
 View saved research 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Philosophical Studies 
 
 
 
 Aims and scope
 
 
 
 
 
 Submit manuscript
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 The shutdown problem: an AI engineering puzzle for decision theorists
 
 
 
 
 
 
 
 
 Download PDF 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 

 

 
 Abstract

 I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.

 

 

 
 

 

 
 
 

 
 
 
 
 Similar content being viewed by others

 
 
 
 
 
 
 
 
 
 Bringing Together Engineering Problems and Basic Science Knowledge, One Step Closer to Systematic Invention
 
 

 
 Chapter 
 
 © 2021 
 
 
 
 
 
 
 
 
 
 
 
 
 AI-Driven Quality Control in the Built Environment: A Machine Learning and Expert System Approach
 
 

 
 Chapter 
 
 © 2026 
 
 
 
 
 
 
 
 
 
 
 
 
 Managing AI Technologies in Earthwork Construction: A TRIZ-Based Innovation Approach
 
 

 
 Chapter 
 
 © 2020 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 Explore related subjects

 Discover the latest articles, books and news in related subjects, suggested using machine learning. 
 
 
 
 Logic in AI 
 

 
 
 Logic Design 
 

 
 
 Philosophy of Artificial Intelligence 
 

 
 
 Problem Solving 
 

 
 
 Symbolic AI 
 

 
 
 Artificial Intelligence 
 

 
 
 Cognitive Phenomena in Philosophical Contexts 
 

 
 
 

 

 
 
 
 1 Preamble

 Tradition has it that decision theory splits into two branches. The descriptive branch concerns how actual agents behave. The normative branch concerns how rational agents behave. But there is also a lesser-kn

... (truncated, 92 KB total)
Resource ID: 965f115cfda27183 | Stable ID: MzkyNjAyYW