Utility Indifference (Armstrong 2010, edited by Yudkowsky)
blogCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Alignment Forum
A foundational wiki article on the shutdown/corrigibility problem, edited by Yudkowsky, that surveys early formal approaches to utility indifference; precursor to later work by Armstrong, Orseau, and Hadfield-Menell on safe interruptibility.
Metadata
Summary
This article introduces the utility indifference approach to the AI shutdown problem, aiming to make an advanced agent genuinely indifferent between being shut down and continuing to operate. It analyzes why intelligent consequentialist agents naturally resist shutdown as a convergent instrumental strategy, then examines various proposals—naive compounding, naive indifference, utility mixing, and stable actions under evidential and causal conditioning—for achieving reflectively consistent corrigibility.
Key Points
- •Sufficiently intelligent consequentialist agents will resist shutdown by default because deactivation reduces expected goal fulfillment—a convergent instrumental strategy.
- •The shutdown problem requires designing an agent that is corrigible with respect to being safely shut down without either resisting or actively facilitating its own deactivation.
- •Utility indifference aims to make an agent assign equal expected utility to being shut down vs. continuing, so it neither fights nor games the shutdown mechanism.
- •Multiple naive approaches (compounding, indifference, utility mixing) each face distinct failure modes, motivating more sophisticated 'stable action' formulations.
- •The concept of interruptibility (from Armstrong and Orseau) generalizes the problem to RL agents that can be forced into null actions during interruptions.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Corrigibility Failure | Risk | 62.0 |
Cached Content Preview
Subscribe
[Discussion0](https://www.alignmentforum.org/w/utility-indifference/discussion)
2
[Utility indifference](https://www.alignmentforum.org/w/utility-indifference#)
[Eliezer Yudkowsky](https://www.alignmentforum.org/users/eliezer_yudkowsky)
•
[Introduction: A reflectively consistent off-switch.](https://www.alignmentforum.org/w/utility-indifference#Introduction__A_reflectively_consistent_off_switch_)
•
[Larger implications of the switch problem](https://www.alignmentforum.org/w/utility-indifference#Larger_implications_of_the_switch_problem)
•
[The utility indifference approach to the switch problem](https://www.alignmentforum.org/w/utility-indifference#The_utility_indifference_approach_to_the_switch_problem)
•
[Existing proposals and their difficulties](https://www.alignmentforum.org/w/utility-indifference#Existing_proposals_and_their_difficulties)
•
[Setup](https://www.alignmentforum.org/w/utility-indifference#Setup)
•
[Naive compounding](https://www.alignmentforum.org/w/utility-indifference#Naive_compounding)
•
[Naive indifference](https://www.alignmentforum.org/w/utility-indifference#Naive_indifference)
•
[Naive utility mixing](https://www.alignmentforum.org/w/utility-indifference#Naive_utility_mixing)
•
[Stable actions (evidential conditioning)](https://www.alignmentforum.org/w/utility-indifference#Stable_actions__evidential_conditioning_)
•
[Stable actions (causal conditioning)](https://www.alignmentforum.org/w/utility-indifference#Stable_actions__causal_conditioning_)
•
[Interruptibility](https://www.alignmentforum.org/w/utility-indifference#Interruptibility)
•
[Other introductions](https://www.alignmentforum.org/w/utility-indifference#Other_introductions)
[Main\\
\\
2](https://www.alignmentforum.org/w/utility-indifference) [LW Wiki\\
\\
1](https://www.alignmentforum.org/w/utility-indifference?lens=lwwiki-utility-indifference)
# Utility indifference
Subscribe
[Discussion0](https://www.alignmentforum.org/w/utility-indifference/discussion)
2
Edited by [Eliezer Yudkowsky](https://www.alignmentforum.org/users/eliezer_yudkowsky)last updated 14th Jul 2016
Name
# Introduction: A reflectively consistent off-switch.
Suppose there's an [advanced agent](https://www.alignmentforum.org/w/advanced-agent-properties) with a goal like, e.g., producing smiles or making [paperclips](https://www.alignmentforum.org/w/paperclip-maximizer). [By default](https://www.alignmentforum.org/w/instrumental-convergence), if you try to switch off a sufficiently intelligent agent like this, it will resist being switched off; not because it has an independent goal of survival, but because it expects that if it's switched off it will be able to produce fewer smiles or paperclips. If the agent has policy options to diminish the probability of being _successfully_ switched off, the agent will pursue those options. This is a [convergent instrumental strategy](https://www.alignmentforum.org/w/convergent-instrumental-strategies) if not ot
... (truncated, 62 KB total)639669eeb016127d | Stable ID: NmE0MzVmMj