Iterated Amplification
blogAuthor
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Alignment Forum
An accessible 2018 explainer by Ajeya Cotra introducing IDA as proposed by Paul Christiano; a good entry point before reading Christiano's primary technical posts on the topic.
Metadata
Summary
A guest post by Ajeya Cotra summarizing Paul Christiano's Iterated Distillation and Amplification (IDA) scheme, which addresses the alignment-capabilities tradeoff by iteratively amplifying human judgment through task decomposition and distilling the results into increasingly capable learned models. The approach draws an analogy to AlphaGoZero, combining human-directed amplification with supervised distillation to maintain alignment while achieving superhuman performance.
Key Points
- •IDA alternates between amplification (humans decompose tasks into subtasks using a learned assistant) and distillation (training a new model to imitate the amplified human).
- •The scheme aims to resolve the alignment-capabilities tradeoff: purely human-supervised methods are safe but limited; pure RL is capable but hard to align.
- •Each iteration produces a more capable model that still reflects human values, analogous to how AlphaGoZero improves through self-play guided by structured learning.
- •Key required properties: the amplified human must be more capable than the learned model, and distillation must preserve alignment through each iteration.
- •IDA is intended as a high-level blueprint; non-learned components like search must also be designed to preserve alignment and runtime performance.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Optimistic Alignment Worldview | Concept | 91.0 |
Cached Content Preview
[Iterated Distillation and Amplification](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#)
7 min read
•
[Motivation: The alignment/capabilities tradeoff](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Motivation__The_alignment_capabilities_tradeoff)
•
[Core concept: Analogy to AlphaGoZero](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Core_concept__Analogy_to_AlphaGoZero)
•
[The IDA Scheme](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#The_IDA_Scheme)
•
[Amplification is interactive and human-directed in IDA](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Amplification_is_interactive_and_human_directed_in_IDA)
•
[Example: Building a superhuman personal assistant](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Example__Building_a_superhuman_personal_assistant)
•
[Pseudocode](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Pseudocode)
•
[What properties must hold for IDA to work?](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#What_properties_must_hold_for_IDA_to_work_)
•
[Achieving alignment and high capability](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Achieving_alignment_and_high_capability)
•
[Achieving competitive performance and efficiency](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Achieving_competitive_performance_and_efficiency)
[Iterated Amplification](https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd)
[Iterated Amplification](https://www.alignmentforum.org/w/iterated-amplification)
Frontpage
# 14
# [Iterated Distillation andAmplification](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-distillation-and-amplification-1)
by [Ajeya Cotra](https://www.alignmentforum.org/users/ajeya-cotra?from=post_header)
29th Nov 2018
7 min read
[14](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#comments)
# 14
This is a guest post summarizing Paul Christiano’s proposed scheme for training machine learning systems that can be robustly aligned to complex and fuzzy values, which I call Iterated Distillation and Amplification (IDA) here. IDA is [notably similar](https://ai-alignment.com/alphago-zero-and-capability-amplification-ede767bb8446) to [AlphaGoZero](https://www.nature.com/articles/nature24270) and [expert iteration](https://arxiv.org/abs/1705.08439).
The hope is that if we use IDA to train each learned component of an AI then the overall AI will remain aligne
... (truncated, 23 KB total)36a29e39dcedcda1 | Stable ID: ZjI2NjhjYz