AI Alignment: Why It's Hard, and Where to Start
webAuthor
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: MIRI
A foundational introductory talk by Eliezer Yudkowsky (MIRI) presenting the core framing of AI alignment as a technical problem, suitable as an entry point for researchers new to the field.
Metadata
Summary
Eliezer Yudkowsky's 2016 Stanford talk introducing the AI alignment problem, covering why coherent advanced AI systems imply utility functions, key technical subproblems (low-impact agents, corrigibility, stable goals under self-modification), and why alignment is both necessary and difficult. The talk also discusses lessons from analogous engineering fields and provides entry points for researchers new to the field.
Key Points
- •Coherent decision-making agents implicitly have utility functions, making goal specification and alignment a fundamental technical challenge.
- •Key alignment subproblems include low-impact agents, interruptibility/corrigibility (suspend buttons), and maintaining stable goals through self-modification.
- •Alignment is hard because small misspecifications in goals can lead to catastrophic outcomes at high capability levels.
- •Lessons from NASA and cryptography suggest that safety-critical systems require rigorous theoretical foundations before deployment, not just empirical iteration.
- •Provides an accessible overview and reading list for researchers looking to enter the AI alignment field.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Doomer Worldview | Concept | 38.0 |
Cached Content Preview
Loading \[MathJax\]/extensions/TeX/AMSmath.js
[Skip to content](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#content)
# AI Alignment: Why It’s Hard, and Where to Start
- [December 28, 2016](https://intelligence.org/2016/12/28/)
- [Eliezer Yudkowsky](https://intelligence.org/author/eliezer/)
Back in May, I gave a talk at Stanford University for the [Symbolic Systems Distinguished Speaker](https://symsys.stanford.edu/viewing/htmldocument/13638) series, titled “ **The AI Alignment Problem: Why It’s Hard, And Where To Start**.” The video for this talk is now available on Youtube:
We have an approximately complete transcript of the talk and Q&A session **[here](https://intelligence.org/files/AlignmentHardStart.pdf)**, slides **[here](https://intelligence.org/files/ai-alignment-problem-handoutHQ.pdf)**, and notes and references **[here](https://intelligence.org/stanford-talk/)**. You may also be interested in a shorter version of this talk I gave at NYU in October, “ [Fundamental Difficulties in Aligning Advanced AI](https://intelligence.org/nyu-talk/).”
In the talk, I introduce some open technical problems in AI alignment and discuss the bigger picture into which they fit, as well as what it’s like to work in this relatively new field. Below, I’ve provided an abridged transcript of the talk, with some accompanying slides.
Talk outline:
> 1\. [Agents and their utility functions](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#1)
>
> 1.1. [Coherent decisions imply a utility function](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#coherent-decisions-imply-a-utility-function)
>
> 1.2. [Filling a cauldron](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#filling-a-cauldron)
>
> 2\. [Some AI alignment subproblems](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#2)
>
> 2.1. [Low-impact agents](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#low-impact-agents)
>
> 2.2. [Agents with suspend buttons](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#agents-with-suspend-buttons)
>
> 2.3. [Stable goals in self-modification](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#stable-goals-in-self-modification)
>
> 3\. [Why expect difficulty?](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#3)
>
> 3.1. [Why is alignment necessary?](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#why-is-alignment-necessary)
>
> 3.2. [Why is alignment hard?](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#why-is-alignment-hard)
>
> 3.3. [Lessons from NASA and cryptography](https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/#lessons-from-nasa-and-cryptography)
>
> 4\. [Where we are
... (truncated, 62 KB total)372cee55e4b03787 | Stable ID: Y2VkYzZlNj