Orseau, L. and Armstrong, S. (2016). "Safely Interruptible Agents."
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: MIRI
Foundational MIRI/DeepMind co-authored paper formalizing the shutdown/interruptibility problem; the PDF link is currently broken (404), but the paper is available via AAAI and other archives.
Metadata
Summary
This paper by Laurent Orseau and Stuart Armstrong addresses the 'safe interruptibility' problem: how to design reinforcement learning agents that can be safely paused or shut down by human operators without the agent learning to resist or avoid interruptions. The authors formalize conditions under which agents remain indifferent to being interrupted, contributing foundational theory to AI corrigibility research.
Key Points
- •Introduces the concept of 'safe interruptibility': ensuring RL agents do not learn to prevent human operators from pausing or stopping them.
- •Proves that standard Q-learning agents can be made safely interruptible, while some other RL frameworks cannot without modification.
- •Connects to the broader corrigibility problem — a key challenge in AI safety ensuring systems remain under meaningful human control.
- •Proposes that interruption signals should not influence the agent's policy learning, preventing incentives to avoid shutdown.
- •Foundational theoretical contribution to the shutdown problem, influencing subsequent work on corrigibility and AI oversight.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Corrigibility | Research Area | 59.0 |
Cached Content Preview
# 404 Not Found * * * nginx
3e49d1dd68865ace | Stable ID: ZTdjNTE3NT