Goodhart's Law (AI Alignment Forum Wiki)

blog

Alignment Forum·alignmentforum.org/w/goodhart-s-law

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

A foundational wiki entry on a core AI alignment concept; the Garrabrant taxonomy it summarizes is widely cited in technical alignment literature on specification gaming and reward hacking.

Metadata

Importance: 78/100wiki pagereference

Summary

This wiki entry explains Goodhart's Law—when a proxy measure becomes the optimization target, it ceases to be a good proxy—and its critical relevance to AI alignment. It presents Scott Garrabrant's taxonomy of four Goodhart failure modes: regressional, causal, extremal, and adversarial, each describing a distinct mechanism by which proxy measures break down under optimization pressure.

Key Points

•Goodhart's Law: optimizing a proxy measure causes it to stop accurately representing the underlying goal it was meant to capture.
•AI alignment risk: a powerful AI optimizing even a good proxy for human values may cause that proxy to catastrophically break down.
•Regressional Goodhart: optimization selects for noise in the proxy, not just the true underlying goal.
•Extremal Goodhart: at extreme values, the proxy-goal correlation observed under normal conditions may no longer hold.
•Adversarial Goodhart: optimization creates incentives for agents to game the proxy, actively destroying its correlation with the true goal.

Cited by 1 page

Page	Type	Quality
Reward Hacking	Risk	91.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20264 KB

Jan
 FEB
 Mar
 

 
 

 
 15
 
 

 
 

 2025
 2026
 2027
 

 
 
 

 

 

 
 
success

 
fail

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 
 

 

 About this capture
 

 

 

 

 

 

 
COLLECTED BY

 

 

 
 
Collection: Common Crawl

 

 

 Web crawl data from Common Crawl.
 

 

 

 

 

 
TIMESTAMPS

 

 

 

 

 

 

The Wayback Machine - http://web.archive.org/web/20260215060649/https://www.alignmentforum.org/w/goodhart-s-law

 

x

 This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. 

AI ALIGNMENT FORUM

AF

Login

Goodhart&#x27;s Law — AI Alignment Forum

Goodhart&#x27;s Law

Edited by Ruby, Vladimir_Nesov, et al. last updated 19th Mar 2023

Goodhart&#x27;s Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart&#x27;s Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for "the stuff that humans care about", it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart&#x27;s law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.

Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.

Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.

Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

See Also

Groupthink, Information cascade, Affective death spiral

Adaptation executers, Superstimulus

Signaling, Filtered evidence

Cached thought

Modesty argument, Egalitarianism

Rationalization, Dark arts

Epistemic hygiene

Scoring rule

 Add Posts

Subscribe

Discussion

4

Subscribe

Discussion

4

Posts tagged Goodhart&#x27;s Law

Most Relevant 

7

54Goodhart Taxonomy

Scott Garrabrant
8y

23

5

26Classifying specification problems as variants of Goodhart&#x27;s Law

Vika
6y

5

4

18Specification gaming examples in AI

Vika
8y

8

5

62When is Goodhart catastrophic?

Drake Thomas, Thomas Kwa
3y

15

3

10Goodhart&#x27;s Curse and Limitations on AI Alignment

Gordon Seidoh Worley
6y

0

3

24How does Gradient Descent Interact with Goodhart?
Q

Scott Garrabrant, 

evhub
7y

Q

4

2

22Intr

... (truncated, 4 KB total)

Resource ID: 752f82912008599a | Stable ID: sid_eC07pi3XqC