Utility Indifference (Armstrong 2010, edited by Yudkowsky)

blog

Alignment Forum·alignmentforum.org/w/utility-indifference

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

A foundational wiki article on the shutdown/corrigibility problem, edited by Yudkowsky, that surveys early formal approaches to utility indifference; precursor to later work by Armstrong, Orseau, and Hadfield-Menell on safe interruptibility.

Metadata

Importance: 72/100blog postprimary source

Summary

This article introduces the utility indifference approach to the AI shutdown problem, aiming to make an advanced agent genuinely indifferent between being shut down and continuing to operate. It analyzes why intelligent consequentialist agents naturally resist shutdown as a convergent instrumental strategy, then examines various proposals—naive compounding, naive indifference, utility mixing, and stable actions under evidential and causal conditioning—for achieving reflectively consistent corrigibility.

Key Points

•Sufficiently intelligent consequentialist agents will resist shutdown by default because deactivation reduces expected goal fulfillment—a convergent instrumental strategy.
•The shutdown problem requires designing an agent that is corrigible with respect to being safely shut down without either resisting or actively facilitating its own deactivation.
•Utility indifference aims to make an agent assign equal expected utility to being shut down vs. continuing, so it neither fights nor games the shutdown mechanism.
•Multiple naive approaches (compounding, indifference, utility mixing) each face distinct failure modes, motivating more sophisticated 'stable action' formulations.
•The concept of interruptibility (from Armstrong and Orseau) generalizes the problem to RL agents that can be forced into null actions during interruptions.

Cited by 1 page

Page	Type	Quality
Corrigibility Failure	Risk	62.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202625 KB

Jan
 FEB
 Mar
 

 
 

 
 15
 
 

 
 

 2025
 2026
 2027
 

 
 
 

 

 

 
 
success

 
fail

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 
 

 

 About this capture
 

 

 

 

 

 

 
COLLECTED BY

 

 

 
 
Collection: Common Crawl

 

 

 Web crawl data from Common Crawl.
 

 

 

 

 

 
TIMESTAMPS

 

 

 

 

 

 

The Wayback Machine - https://web.archive.org/web/20260215044635/https://www.alignmentforum.org/w/utility-indifference

 

x

 This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. 

AI ALIGNMENT FORUM

AF

Login

Utility indifference — AI Alignment Forum

Main

2

LW Wiki

1

Utility indifference

Edited by Eliezer Yudkowsky last updated 15th Jul 2016

Introduction: A reflectively consistent off-switch.

Suppose there&#x27;s an advanced agent with a goal like, e.g., producing smiles or making paperclips. By default, if you try to switch off a sufficiently intelligent agent like this, it will resist being switched off; not because it has an independent goal of survival, but because it expects that if it&#x27;s switched off it will be able to produce fewer smiles or paperclips. If the agent has policy options to diminish the probability of being successfully switched off, the agent will pursue those options. This is a convergent instrumental strategy if not otherwise prevented.

Difficulty 1: By default a consequentialist reasoner with sufficient real-world understanding to relate the events of its being switched off, to the later underfulfillment of its goals, will resist being switched off.

The shutdown problem is to describe an advanced agent which is corrigible with respect to permitting itself to be safely shut down or suspended to disk. A reinforcement learning agent which can be forced to e.g. perform a null action repeatedly for a period of time, is called interruptible by Stuart Armstrong and Laurent Orseau.

This isn&#x27;t as simple as writing a special function into the AI which carries out a shutdown after a switch is pressed. If you start out controlling the agent&#x27;s source code, and you naively try to build in functions which suspend the agent to disk when a button is pressed, and the agent later gains the ability to self-modify, it would remove those functions. (Possibly while trying to conceal the fact that the function would no longer operate.)

Corollary 1a: By default a consequentialist reasoner. with sufficient programmatic understanding to relate the execution of a shutdown function to the later underfulfillment of its goals, which has policy options for modifying its code, will want to remove the shutdown function.

We can see this as a natural consequence of our trying to modify the agent&#x27;s code in a way that was not consistent with the code&#x27;s explicit goals. If you create an agent with source code P that is well-suited to achieving a goal U and that explicitly represents U as a goal, the agent&#x27;s code P will be reflecti

... (truncated, 25 KB total)

Resource ID: 639669eeb016127d | Stable ID: sid_ABKuKMAxKm