Iterated Amplification

blog

2018·Alignment Forum·alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-ampli...

Author

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

An accessible 2018 explainer by Ajeya Cotra introducing IDA as proposed by Paul Christiano; a good entry point before reading Christiano's primary technical posts on the topic.

Metadata

Importance: 72/100blog posteducational

Summary

A guest post by Ajeya Cotra summarizing Paul Christiano's Iterated Distillation and Amplification (IDA) scheme, which addresses the alignment-capabilities tradeoff by iteratively amplifying human judgment through task decomposition and distilling the results into increasingly capable learned models. The approach draws an analogy to AlphaGoZero, combining human-directed amplification with supervised distillation to maintain alignment while achieving superhuman performance.

Key Points

•IDA alternates between amplification (humans decompose tasks into subtasks using a learned assistant) and distillation (training a new model to imitate the amplified human).
•The scheme aims to resolve the alignment-capabilities tradeoff: purely human-supervised methods are safe but limited; pure RL is capable but hard to align.
•Each iteration produces a more capable model that still reflects human values, analogous to how AlphaGoZero improves through self-play guided by structured learning.
•Key required properties: the amplified human must be more capable than the learned model, and distillation must preserve alignment through each iteration.
•IDA is intended as a high-level blueprint; non-learned components like search must also be designed to preserve alignment and runtime performance.

Cited by 1 page

Page	Type	Quality
Optimistic Alignment Worldview	Concept	91.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202623 KB

[Iterated Distillation and Amplification](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#)

7 min read

•

[Motivation: The alignment/capabilities tradeoff](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Motivation__The_alignment_capabilities_tradeoff)

•

[Core concept: Analogy to AlphaGoZero](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Core_concept__Analogy_to_AlphaGoZero)

•

[The IDA Scheme](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#The_IDA_Scheme)

•

[Amplification is interactive and human-directed in IDA](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Amplification_is_interactive_and_human_directed_in_IDA)

•

[Example: Building a superhuman personal assistant](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Example__Building_a_superhuman_personal_assistant)

•

[Pseudocode](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Pseudocode)

•

[What properties must hold for IDA to work?](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#What_properties_must_hold_for_IDA_to_work_)

•

[Achieving alignment and high capability](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Achieving_alignment_and_high_capability)

•

[Achieving competitive performance and efficiency](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#Achieving_competitive_performance_and_efficiency)

[Iterated Amplification](https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd)

[Iterated Amplification](https://www.alignmentforum.org/w/iterated-amplification)
Frontpage

# 14

# [Iterated Distillation andAmplification](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-distillation-and-amplification-1)

by [Ajeya Cotra](https://www.alignmentforum.org/users/ajeya-cotra?from=post_header)

29th Nov 2018

7 min read

[14](https://www.alignmentforum.org/posts/HqLxuZ4LhaFhmAHWk/iterated-amplification-welcome-to-the-neighborhood#comments)

# 14

This is a guest post summarizing Paul Christiano’s proposed scheme for training machine learning systems that can be robustly aligned to complex and fuzzy values, which I call Iterated Distillation and Amplification (IDA) here. IDA is [notably similar](https://ai-alignment.com/alphago-zero-and-capability-amplification-ede767bb8446) to [AlphaGoZero](https://www.nature.com/articles/nature24270) and [expert iteration](https://arxiv.org/abs/1705.08439).

The hope is that if we use IDA to train each learned component of an AI then the overall AI will remain aligne

... (truncated, 23 KB total)

Resource ID: 36a29e39dcedcda1 | Stable ID: sid_T6kSKwvj2l