Orseau, L. and Armstrong, S. (2016). "Safely Interruptible Agents."

web

MIRI·intelligence.org/files/Interruptible.pdf

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: MIRI

Foundational MIRI/DeepMind co-authored paper formalizing the shutdown/interruptibility problem; the PDF link is currently broken (404), but the paper is available via AAAI and other archives.

Metadata

Importance: 82/100working paperprimary source

Summary

This paper by Laurent Orseau and Stuart Armstrong addresses the 'safe interruptibility' problem: how to design reinforcement learning agents that can be safely paused or shut down by human operators without the agent learning to resist or avoid interruptions. The authors formalize conditions under which agents remain indifferent to being interrupted, contributing foundational theory to AI corrigibility research.

Key Points

•Introduces the concept of 'safe interruptibility': ensuring RL agents do not learn to prevent human operators from pausing or stopping them.
•Proves that standard Q-learning agents can be made safely interruptible, while some other RL frameworks cannot without modification.
•Connects to the broader corrigibility problem — a key challenge in AI safety ensuring systems remain under meaningful human control.
•Proposes that interruption signals should not influence the agent's policy learning, preventing incentives to avoid shutdown.
•Foundational theoretical contribution to the shutdown problem, influencing subsequent work on corrigibility and AI oversight.

Cited by 1 page

Page	Type	Quality
Corrigibility	Research Area	59.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20260 KB

# 404 Not Found

* * *

nginx

Resource ID: 3e49d1dd68865ace | Stable ID: sid_wIiiuqMDtE