Back
Christiano, P. (2017). "Corrigibility."
webai-alignment.com·ai-alignment.com/corrigibility-3039e668638
A foundational blog post by Paul Christiano that broadens the concept of corrigibility beyond shutdown compliance and links it to his act-based agent framework, providing theoretical grounding for why corrigibility may be achievable and self-reinforcing.
Metadata
Importance: 72/100blog postprimary source
Summary
Paul Christiano argues that a benign act-based AI agent will be robustly corrigible if designed correctly, and that corrigibility forms a broad basin of attraction toward acceptable outcomes rather than a narrow target. The post frames corrigibility broadly—encompassing error correction, human oversight, preference clarification, and resource control—and explains why this view underlies Christiano's overall optimism about AI alignment.
Key Points
- •Corrigibility is defined broadly to include: error correction, transparency, preference clarification, resource control, and self-perpetuating safe behavior.
- •A benign act-based agent will be robustly corrigible if we want it to be, due to its deference to human preferences.
- •Corrigibility is not a narrow target but a wide basin of attraction—a sufficiently corrigible agent tends to become more corrigible over time.
- •This framing implies alignment researchers should focus on avoiding catastrophic failures that push systems out of the corrigibility basin rather than achieving perfect value specification.
- •The post connects corrigibility to Christiano's broader act-based agent framework and his optimism about practical alignment approaches.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Corrigibility | Research Area | 59.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202632 KB
[Sitemap](https://ai-alignment.com/sitemap/sitemap.xml)
[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fai-alignment.com%2Fcorrigibility-3039e668638&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
Get app
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fai-alignment.com%2Fcorrigibility-3039e668638&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

[**AI Alignment**](https://ai-alignment.com/?source=post_page---publication_nav-624d886c4aa4-3039e668638---------------------------------------)
·
Follow publication
[](https://ai-alignment.com/?source=post_page---post_publication_sidebar-624d886c4aa4-3039e668638---------------------------------------)
Aligning AI systems with human interests.
Follow publication
# Corrigibility
[](https://paulfchristiano.medium.com/?source=post_page---byline--3039e668638---------------------------------------)
[Paul Christiano](https://paulfchristiano.medium.com/?source=post_page---byline--3039e668638---------------------------------------)
Follow
8 min read
·
Jun 10, 2017
175
4
[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D3039e668638&operation=register&redirect=https%3A%2F%2Fai-alignment.com%2Fcorrigibility-3039e668638&source=---header_actions--3039e668638---------------------post_audio_button------------------)
Share
( _Warning: rambling._)
I would like to build AI systems which help me:
- Figure out whether I built the right AI and correct any mistakes I made
- Remain informed about the AI’s behavior and avoid unpleasant surprises
- Make better decisions and clarify my preferences
- Acquire resources and remain in effective control of them
- Ensure that my AI systems continue to do all of these nice things
- …and so on
We say an agent is [_corrigible_](https://intelligence.org/files/Corrigibility.pdf) ( [article on Arbital](https://arbital.com/p/corrigibility/)) if it has these properties. I believe this
... (truncated, 32 KB total)Resource ID:
41ce82b75cb1cac3 | Stable ID: NmJmMTkxZD