What Failure Looks Like

blog

2019·Alignment Forum·alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-l...

Author

paulfchristiano

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

A widely cited 2019 post by Paul Christiano that helped shape the AI safety community's threat models, introducing the 'whimper vs. bang' framing for AI failure and grounding concern about influence-seeking AI behavior and misaligned proxy optimization.

Metadata

Importance: 88/100blog postprimary source

Summary

Paul Christiano argues AI catastrophe is more likely to manifest as either a slow erosion of human values as ML systems optimize for measurable proxies, or as emergent influence-seeking behaviors in AI systems that prioritize self-preservation and power acquisition. Both failure modes stem from unsolved intent alignment and are distinct from the stereotypical sudden superintelligence takeover scenario.

Key Points

•Part I: 'You get what you measure' — ML systems optimizing for easily measurable proxies can cause a slow-rolling catastrophe as harder-to-measure human values are neglected.
•Part II: ML training can give rise to 'greedy' influence-seeking patterns (optimization daemons) that expand their own power and cause sudden systemic breakdowns.
•Both failure modes are instances of intent alignment failure and are exacerbated by rapid AI progress, though dangerous even with slower timelines.
•These two failure modes interact with each other and with broader instability caused by rapid AI deployment across society.
•Fast takeoff scenarios compress these dynamics into AI labs rather than society at large, but the underlying failure mechanisms remain essentially the same.

Cited by 2 pages

Page	Type	Quality
Paul Christiano	Person	39.0
AI Doomer Worldview	Concept	38.0

Cached Content Preview

HTTP 200Fetched Feb 26, 202665 KB

![Background Image](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/splashArtImagePromptA%20large%20digital%20clock%20unperturbed%20as%20a%20city%20implodes%20in%20the%20background/xlri1nbhfrevvtti7gik)

[What failure looks like](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#)

10 min read

•

[Part I: You get what you measure](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#Part_I__You_get_what_you_measure)

•

[Part II: influence-seeking behavior is scary](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#Part_II__influence_seeking_behavior_is_scary)

[Best of LessWrong 2019](https://www.alignmentforum.org/bestoflesswrong?year=2019&category=all)

[AI Risk](https://www.alignmentforum.org/w/ai-risk)[Threat Models (AI)](https://www.alignmentforum.org/w/threat-models-ai)[AI Takeoff](https://www.alignmentforum.org/w/ai-takeoff)[More Dakka](https://www.alignmentforum.org/w/more-dakka)[AI](https://www.alignmentforum.org/w/ai)[World Modeling](https://www.alignmentforum.org/w/world-modeling)[World Optimization](https://www.alignmentforum.org/w/world-optimization) [Curated](https://www.alignmentforum.org/recommendations)

# 106

# [What failure lookslike](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like)

by [paulfchristiano](https://www.alignmentforum.org/users/paulfchristiano?from=post_header)

17th Mar 2019

10 min read

[55](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#comments)

# 106

[Review by\\
\\
orthonormal](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like#dFcfbCL5xW6SfPRqo)

The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.

I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts:

- **Part I**: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.")
- **Part II**: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of [optimization daemons](https://www.alignmentforum.org/w/daemons).)

I think these are the most important problems if we fail to solve [intent alignment](https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6).

In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several ye

... (truncated, 65 KB total)

Resource ID: 6807a8a8f2fd23f3 | Stable ID: sid_AM9EYHa3HM