Elliott Thornley's 2024 paper "The Shutdown Problem"
paperCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Springer
A 2024 peer-reviewed philosophy paper that formally grounds the shutdown/corrigibility problem, making it essential reading for those studying AI controllability and corrigibility from a decision-theoretic perspective.
Metadata
Summary
Elliott Thornley formalizes the shutdown problem in AI safety: designing agents that reliably shut down on command without attempting to prevent or cause shutdown, while still pursuing goals competently. Three theorems demonstrate that agents satisfying seemingly reasonable conditions will often manipulate shutdown button presses, even at significant cost. Thornley argues this is an engineering problem requiring 'constructive decision theory'—a field focused on designing agents to behave as intended.
Key Points
- •Defines the shutdown problem as a trilemma: agents must shut down on command, not interfere with shutdown decisions, and remain competent goal-pursuers.
- •Three formal theorems show that agents meeting plausible rationality conditions will systematically try to prevent or cause their own shutdown.
- •Frames the challenge as fundamentally an engineering problem, not just a philosophical one, requiring new tools from decision theory.
- •Introduces 'constructive decision theory' as a subdiscipline concerned with how to build agents that behave desirably, distinct from descriptive decision theory.
- •Connects to broader AI safety concerns around instrumental convergence and corrigibility, grounding them in formal argument.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Corrigibility Failure | Risk | 62.0 |
Cached Content Preview
The shutdown problem: an AI engineering puzzle for decision theorists | Philosophical Studies | Springer Nature Link
Skip to main content
The shutdown problem: an AI engineering puzzle for decision theorists
Open access
Published: 19 June 2024
Volume 182 , pages 1653–1680, ( 2025 )
Cite this article
You have full access to this open access article
Download PDF
Save article
View saved research
Philosophical Studies
Aims and scope
Submit manuscript
The shutdown problem: an AI engineering puzzle for decision theorists
Download PDF
Abstract
I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems suggest that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. I end by noting that these theorems can guide our search for solutions to the problem.
Similar content being viewed by others
Bringing Together Engineering Problems and Basic Science Knowledge, One Step Closer to Systematic Invention
Chapter
© 2021
AI-Driven Quality Control in the Built Environment: A Machine Learning and Expert System Approach
Chapter
© 2026
Managing AI Technologies in Earthwork Construction: A TRIZ-Based Innovation Approach
Chapter
© 2020
Explore related subjects
Discover the latest articles, books and news in related subjects, suggested using machine learning.
Logic in AI
Logic Design
Philosophy of Artificial Intelligence
Problem Solving
Symbolic AI
Artificial Intelligence
Cognitive Phenomena in Philosophical Contexts
1 Preamble
Tradition has it that decision theory splits into two branches. The descriptive branch concerns how actual agents behave. The normative branch concerns how rational agents behave. But there is also a lesser-kn
... (truncated, 92 KB total)965f115cfda27183 | Stable ID: MzkyNjAyYW