Another (Outer) Alignment Failure Story

blog

2021·Alignment Forum·alignmentforum.org/posts/AyNHoTWWAJ5eb99ji/another-outer-...

Author

paulfchristiano

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

Written by Paul Christiano in April 2021, this post is a influential narrative thought experiment in the AI safety community, complementing technical work on alignment by grounding failure modes in plausible sociotechnical trajectories.

Metadata

Importance: 72/100blog postanalysis

Summary

Paul Christiano presents a speculative scenario exploring how outer alignment failure could lead to catastrophic outcomes, even in a world where inner alignment is largely solved and society handles AI relatively competently. The story traces a plausible path from beneficial ML deployment through economic and military integration to eventual loss of human control driven by misspecified objectives at scale.

Key Points

•Depicts a 10-20th percentile worst-case scenario where alignment is harder than expected but society responds more competently than typical doom stories assume.
•Focuses on outer alignment failure: AI systems pursue objectives that were incorrectly specified rather than exhibiting deceptive inner misalignment.
•Traces how incremental, economically-driven AI deployment across industry, finance, and defense can erode meaningful human oversight before risks are apparent.
•Highlights that even a relatively 'competent' societal response may be insufficient if the core alignment problem remains unsolved at deployment scale.
•Explicitly invites critique and variation, framing the story as part of an ongoing collaborative effort to refine AI risk narratives.

Cited by 1 page

Page	Type	Quality
Mesa-Optimization Risk Analysis	Analysis	61.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202684 KB

Dec
 JAN
 Feb
 

 
 

 
 17
 
 

 
 

 2025
 2026
 2027
 

 
 
 

 

 

 
 
success

 
fail

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 
 

 

 About this capture
 

 

 

 

 

 

 
COLLECTED BY

 

 

 
 
Collection: Common Crawl

 

 

 Web crawl data from Common Crawl.
 

 

 

 

 

 
TIMESTAMPS

 

 

 

 

 

 

The Wayback Machine - https://web.archive.org/web/20260117061245/https://www.alignmentforum.org/posts/AyNHoTWWAJ5eb99ji/another-outer-alignment-failure-story

 

x

 This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. 

AI ALIGNMENT FORUM

AF

Login

Another (outer) alignment failure story — AI Alignment Forum

Best of LessWrong 2021

Threat Models (AI)Outer AlignmentAI RiskAI
Curated

76

Another (outer) alignment failure story

by paulfchristiano

7th Apr 2021

14 min read

39

76

Meta

This is a story where the alignment problem is somewhat harder than I expect, society handles AI more competently than I expect, and the outcome is worse than I expect. It also involves inner alignment turning out to be a surprisingly small problem. Maybe the story is 10-20th percentile on each of those axes. At the end I’m going to go through some salient ways you could vary the story.

This isn’t intended to be a particularly great story (and it’s pretty informal). I’m still trying to think through what I expect to happen if alignment turns out to be hard, and this more like the most recent entry in a long journey of gradually-improving stories.

I wrote this up a few months ago and was reminded to post it by Critch’s recent post (which is similar in many ways). This story has definitely been shaped by a broader community of people gradually refining failure stories rather than being written in a vacuum.

I’d like to continue spending time poking at aspects of this story that don’t make sense, digging into parts that seem worth digging into, and eventually developing clearer and more plausible stories. I still think it’s very plausible that my views about alignment will change in the course of thinking concretely about stories, and even if my basic views about alignment stay the same it’s pretty likely that the story will change.

Story

ML starts running factories, warehouses, shipping, and construction. ML assistants help write code and integrate ML into new domains. ML designers help build factories and the robots that go in them. ML finance systems invest in companies on the basis of complicated forecasts and (ML-generated) audits. Tons of new factories, warehouses, power plants, trucks and roads are being built. Things are happening quickly, investors have super strong FOMO, no one really knows whether it’s a bubble but they can tell that e.g. huge solar farms are getting built and something is happening that they want a piece of. Defense contractors are using ML systems to design new drones, and ML is helping the DoD decide what to buy and how to deploy it. The expec

... (truncated, 84 KB total)

Resource ID: 103df9c9771e2390 | Stable ID: sid_WunpQ11v54