Quintin Pope and collaborators
blogCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Alignment Forum
A foundational sequence introducing Shard Theory as an alternative to reward-centric alignment frameworks, influential in LessWrong/Alignment Forum circles for reframing how learned values and inner alignment are conceptualized.
Metadata
Summary
Shard Theory is a research sequence by Quintin Pope, Alex Turner, and collaborators proposing that AI values emerge as multiple competing 'shards'—context-dependent learned behavioral dispositions—rather than a single optimized reward signal. The framework challenges classical alignment decompositions like inner/outer alignment and draws on human value formation as evidence. It offers alternative design principles for building aligned agents.
Key Points
- •Values in trained agents emerge as multiple competing 'shards'—context-triggered behavioral patterns—not monolithic reward optimization.
- •Reward signals during training shape but do not define agent goals; 'reward is not the optimization target.'
- •The inner/outer alignment decomposition may be counterproductive, replacing one hard problem with two even harder ones.
- •Human value formation provides underutilized empirical evidence for how aligned AI values could develop naturally through training.
- •Proposes concrete alignment-relevant design principles: avoid adversarial input exploitation and avoid aligning agents to plan evaluations.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Why Alignment Might Be Easy | Argument | 53.0 |
Cached Content Preview

# Shard Theory
_Written by Quintin Pope, Alex Turner, Charles Foster, and Logan Smith. Card image generated by DALL-E 2:_

59[Humans provide an untapped wealth of evidence about alignment](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/CjFZeDD6iCnNubDoS)
[TurnTrout](https://www.alignmentforum.org/users/turntrout), [Quintin Pope](https://www.alignmentforum.org/users/quintin-pope)
4y
42
42[Human values & biases are inaccessible to the genome](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/CQAMdzA4MZEhNRtTp)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
4y
38
26[General alignment properties](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/FMdGt9S9irgxeD9Xz)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
4y
2
29[Evolution is a bad analogy for AGI: inner alignment](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/FyChg3kYG54tEN3u6)
[Quintin Pope](https://www.alignmentforum.org/users/quintin-pope)
4y
1
94[Reward is not the optimization target](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/pdaGN6pQyQarFHXF4)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
4y
88
74[The shard theory of human values](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/iCfdcxiyr2Kj8m8mT)
[Quintin Pope](https://www.alignmentforum.org/users/quintin-pope), [TurnTrout](https://www.alignmentforum.org/users/turntrout)
3y
33
23[Understanding and avoiding value drift](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/jFvFreCeejRKaZv4v)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
4y
7
36[A shot at the diamond-alignment problem](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/k4AQqboXz8iE5TNXK)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
3y
45
32[Don't design agents which exploit adversarial inputs](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/jFCK9JRLwkoJX4aJA)
[TurnTrout](https://www.alignmentforum.org/users/turntrout), [Garrett Baker](https://www.alignmentforum.org/users/d0themath)
3y
33
24[Don't align agents to evaluations of plans](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/fopZesxLCGAXqqaPv)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
3y
32
29[Alignment allows "nonrobust" decision-influences and doesn't require robust grading](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/rauMEna2ddf26BqiE)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
3y
31
43[Inner and outer alignment decompose one hard problem into two extremely hard problems](https://www.alignmentforum.org/s/nyEFg3AuJpdAozmoX/p/gHefoxiznGfsbiAu9)
[TurnTrout](https://www.alignmentforum.org/users/turntrout)
3y
14
x
Shard Theory — AI Alignment Forum
reCAPTCHA
Recaptcha requires
... (truncated, 3 KB total)533f1062192748de | Stable ID: YTM5Nzg4M2