Iterated Distillation and Amplification

web

ai-alignment.com·ai-alignment.com/iterated-distillation-and-amplification-...

This 2018 Medium post is the canonical accessible introduction to Paul Christiano's Iterated Distillation and Amplification (IDA) proposal, a foundational scalable oversight approach widely referenced in the AI alignment literature.

Metadata

Importance: 82/100blog postprimary source

Summary

This guest post by Ajeya Cotra summarizes Paul Christiano's IDA scheme for training ML systems robustly aligned to complex human values. IDA alternates between amplification (using humans plus AI tools to handle harder tasks) and distillation (training a new AI to imitate that augmented human), iteratively bootstrapping capability while preserving alignment. The approach draws analogies to AlphaGo Zero and expert iteration.

Key Points

•IDA addresses the alignment/capabilities tradeoff by iteratively amplifying human oversight then distilling it into a learned model.
•Amplification augments a human operator with AI assistance to handle tasks beyond unaided human capability, preserving value alignment at each step.
•Distillation trains a new model to imitate the amplified human, compressing the capability gains into a deployable system.
•The scheme is notably analogous to AlphaGo Zero and expert iteration, suggesting it may achieve state-of-the-art performance.
•Safety relies on each iteration preserving alignment properties, so any non-learned components (search, logic) must also be alignment-preserving.

Cited by 2 pages

Page	Type	Quality
AI Safety Research Value Model	Analysis	60.0
AI Alignment	Approach	91.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202628 KB

[Sitemap](https://ai-alignment.com/sitemap/sitemap.xml)

[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fai-alignment.com%2Fiterated-distillation-and-amplification-157debfd1616&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

[Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)

Get app

[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)

[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)

Sign up

[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fai-alignment.com%2Fiterated-distillation-and-amplification-157debfd1616&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

![](https://miro.medium.com/v2/resize:fill:64:64/1*dmbNkD5D-u45r44go_cf0g.png)

[**AI Alignment**](https://ai-alignment.com/?source=post_page---publication_nav-624d886c4aa4-157debfd1616---------------------------------------)

·

Follow publication

[![AI Alignment](https://miro.medium.com/v2/resize:fill:76:76/1*N56Qc5-aHTcfGff0scntKQ.png)](https://ai-alignment.com/?source=post_page---post_publication_sidebar-624d886c4aa4-157debfd1616---------------------------------------)

Aligning AI systems with human interests.

Follow publication

# Iterated Distillation and Amplification

[![Ajeya Cotra](https://miro.medium.com/v2/resize:fill:64:64/0*0le8FC7fXxuRXzJ6.)](https://medium.com/@acotra2017?source=post_page---byline--157debfd1616---------------------------------------)

[Ajeya Cotra](https://medium.com/@acotra2017?source=post_page---byline--157debfd1616---------------------------------------)

Follow

8 min read

·

Mar 4, 2018

865

6

[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D157debfd1616&operation=register&redirect=https%3A%2F%2Fai-alignment.com%2Fiterated-distillation-and-amplification-157debfd1616&source=---header_actions--157debfd1616---------------------post_audio_button------------------)

Share

This is a guest post summarizing Paul Christiano’s proposed scheme for training machine learning systems that can be robustly aligned to complex and fuzzy values, which I call Iterated Distillation and Amplification (IDA) here. IDA is [notably similar](https://ai-alignment.com/alphago-zero-and-capability-amplification-ede767bb8446) to [AlphaGoZero](https://www.nature.com/articles/nature24270) and [expert iteration](https://arxiv.org/abs/1705.08439).

The hope is that if we use IDA to train each learned component 

... (truncated, 28 KB total)

Resource ID: 77e9bf1a01a5b587 | Stable ID: sid_JcvrMhHpyp