Skip to content
Longterm Wiki
Back

Christiano, P. (2017). "Corrigibility."

web

A foundational blog post by Paul Christiano that broadens the concept of corrigibility beyond shutdown compliance and links it to his act-based agent framework, providing theoretical grounding for why corrigibility may be achievable and self-reinforcing.

Metadata

Importance: 72/100blog postprimary source

Summary

Paul Christiano argues that a benign act-based AI agent will be robustly corrigible if designed correctly, and that corrigibility forms a broad basin of attraction toward acceptable outcomes rather than a narrow target. The post frames corrigibility broadly—encompassing error correction, human oversight, preference clarification, and resource control—and explains why this view underlies Christiano's overall optimism about AI alignment.

Key Points

  • Corrigibility is defined broadly to include: error correction, transparency, preference clarification, resource control, and self-perpetuating safe behavior.
  • A benign act-based agent will be robustly corrigible if we want it to be, due to its deference to human preferences.
  • Corrigibility is not a narrow target but a wide basin of attraction—a sufficiently corrigible agent tends to become more corrigible over time.
  • This framing implies alignment researchers should focus on avoiding catastrophic failures that push systems out of the corrigibility basin rather than achieving perfect value specification.
  • The post connects corrigibility to Christiano's broader act-based agent framework and his optimism about practical alignment approaches.

Cited by 1 page

PageTypeQuality
CorrigibilityResearch Area59.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202632 KB
[Sitemap](https://ai-alignment.com/sitemap/sitemap.xml)

[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fai-alignment.com%2Fcorrigibility-3039e668638&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

[Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)

Get app

[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)

[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fai-alignment.com%2Fcorrigibility-3039e668638&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

![](https://miro.medium.com/v2/resize:fill:64:64/1*dmbNkD5D-u45r44go_cf0g.png)

[**AI Alignment**](https://ai-alignment.com/?source=post_page---publication_nav-624d886c4aa4-3039e668638---------------------------------------)

·

Follow publication

[![AI Alignment](https://miro.medium.com/v2/resize:fill:76:76/1*N56Qc5-aHTcfGff0scntKQ.png)](https://ai-alignment.com/?source=post_page---post_publication_sidebar-624d886c4aa4-3039e668638---------------------------------------)

Aligning AI systems with human interests.

Follow publication

# Corrigibility

[![Paul Christiano](https://miro.medium.com/v2/resize:fill:64:64/1*BNjZCuQuRfIgcXCBMipuBw.jpeg)](https://paulfchristiano.medium.com/?source=post_page---byline--3039e668638---------------------------------------)

[Paul Christiano](https://paulfchristiano.medium.com/?source=post_page---byline--3039e668638---------------------------------------)

Follow

8 min read

·

Jun 10, 2017

175

4

[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D3039e668638&operation=register&redirect=https%3A%2F%2Fai-alignment.com%2Fcorrigibility-3039e668638&source=---header_actions--3039e668638---------------------post_audio_button------------------)

Share

( _Warning: rambling._)

I would like to build AI systems which help me:

- Figure out whether I built the right AI and correct any mistakes I made
- Remain informed about the AI’s behavior and avoid unpleasant surprises
- Make better decisions and clarify my preferences
- Acquire resources and remain in effective control of them
- Ensure that my AI systems continue to do all of these nice things
- …and so on

We say an agent is [_corrigible_](https://intelligence.org/files/Corrigibility.pdf) ( [article on Arbital](https://arbital.com/p/corrigibility/)) if it has these properties. I believe this

... (truncated, 32 KB total)
Resource ID: 41ce82b75cb1cac3 | Stable ID: NmJmMTkxZD