Skip to content
Longterm Wiki
Back

Another (Outer) Alignment Failure Story

blog

Author

paulfchristiano

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

Written by Paul Christiano in April 2021, this post is a influential narrative thought experiment in the AI safety community, complementing technical work on alignment by grounding failure modes in plausible sociotechnical trajectories.

Metadata

Importance: 72/100blog postanalysis

Summary

Paul Christiano presents a speculative scenario exploring how outer alignment failure could lead to catastrophic outcomes, even in a world where inner alignment is largely solved and society handles AI relatively competently. The story traces a plausible path from beneficial ML deployment through economic and military integration to eventual loss of human control driven by misspecified objectives at scale.

Key Points

  • Depicts a 10-20th percentile worst-case scenario where alignment is harder than expected but society responds more competently than typical doom stories assume.
  • Focuses on outer alignment failure: AI systems pursue objectives that were incorrectly specified rather than exhibiting deceptive inner misalignment.
  • Traces how incremental, economically-driven AI deployment across industry, finance, and defense can erode meaningful human oversight before risks are apparent.
  • Highlights that even a relatively 'competent' societal response may be insufficient if the core alignment problem remains unsolved at deployment scale.
  • Explicitly invites critique and variation, framing the story as part of an ongoing collaborative effort to refine AI risk narratives.

Cited by 1 page

PageTypeQuality
Mesa-Optimization Risk AnalysisAnalysis61.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202690 KB
x

Another (outer) alignment failure story — AI Alignment Forum

![Background Image](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/splashArtImagePromptA%20city%20skyline%20with%20various%20industries%2C%20overshadowed%20by%20a%20giant%20robot/mruska6k08uhfvukzmvw)

[Best of LessWrong 2021](https://www.alignmentforum.org/bestoflesswrong?year=2021&category=all)

[Threat Models (AI)](https://www.alignmentforum.org/w/threat-models-ai)[Outer Alignment](https://www.alignmentforum.org/w/outer-alignment)[AI Risk](https://www.alignmentforum.org/w/ai-risk)[AI](https://www.alignmentforum.org/w/ai) [Curated](https://www.alignmentforum.org/recommendations)

# 76

# [Another (outer) alignment failurestory](https://www.alignmentforum.org/posts/AyNHoTWWAJ5eb99ji/another-outer-alignment-failure-story)

by [paulfchristiano](https://www.alignmentforum.org/users/paulfchristiano?from=post_header)

7th Apr 2021

14 min read

[39](https://www.alignmentforum.org/posts/AyNHoTWWAJ5eb99ji/another-outer-alignment-failure-story#comments)

# 76

## Meta

This is a story where the alignment problem is somewhat harder than I expect, society handles AI more competently than I expect, and the outcome is worse than I expect. It also involves inner alignment turning out to be a surprisingly small problem. Maybe the story is 10-20th percentile on each of those axes. At the end I’m going to go through some salient ways you could vary the story.

This isn’t intended to be a particularly great story (and it’s pretty informal). I’m still trying to think through what I expect to happen if alignment turns out to be hard, and this more like the most recent entry in a long journey of gradually-improving stories.

I wrote this up a few months ago and was reminded to post it by Critch’s [recent post](https://www.alignmentforum.org/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic) (which is similar in many ways). This story has definitely been shaped by a broader community of people gradually refining failure stories rather than being written in a vacuum.

I’d like to continue spending time poking at aspects of this story that don’t make sense, digging into parts that seem worth digging into, and eventually developing clearer and more plausible stories. I still think it’s very plausible that my views about alignment will change in the course of thinking concretely about stories, and even if my basic views about alignment stay the same it’s pretty likely that the story will change.

## Story

ML starts running factories, warehouses, shipping, and construction. ML assistants help write code and integrate ML into new domains. ML designers help build factories and the robots that go in them. ML finance systems invest in companies on the basis of complicated forecasts and (ML-generated) audits. Tons of new factories, warehouses, power plants, trucks and roads are being built. Things are happening quickly, investors have sup

... (truncated, 90 KB total)
Resource ID: 103df9c9771e2390 | Stable ID: MjY1ODA1MT