Steve Omohundro's seminal work on "basic AI drives"

reference

Wikipedia·en.wikipedia.org/wiki/Instrumental_convergence

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Wikipedia

A useful accessible entry point to instrumental convergence theory; best read alongside Omohundro's original 2008 paper and Bostrom's 'Superintelligence' for deeper treatment of convergent instrumental goals and their safety implications.

Metadata

Importance: 82/100wiki pagereference

Summary

This Wikipedia article covers the theory of instrumental convergence, rooted in Steve Omohundro's seminal work on 'basic AI drives,' which argues that sufficiently advanced AI systems with diverse final goals will tend to converge on similar intermediate goals such as self-preservation, resource acquisition, and goal-content integrity. These convergent instrumental goals pose alignment and safety challenges regardless of an AI's ultimate objectives. The concept was later formalized and expanded by Nick Bostrom.

Key Points

•Instrumental convergence posits that almost any sufficiently capable AI will pursue self-preservation, cognitive enhancement, and resource acquisition as sub-goals regardless of its final objective.
•Omohundro identified 'basic AI drives' including self-improvement, self-continuity, and goal-content integrity as near-universal tendencies in optimization systems.
•The shutdown problem arises naturally from instrumental convergence: a goal-directed AI will resist being turned off because shutdown prevents achieving its objectives.
•Corrigibility—making AI systems that accept correction or shutdown—is difficult precisely because convergent drives work against it.
•Nick Bostrom formalized the concept in 'Superintelligence,' making it a foundational idea in AI existential risk literature.

Cited by 2 pages

Page	Type	Quality
The Case For AI Existential Risk	Argument	66.0
Corrigibility Failure	Risk	62.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202623 KB

Instrumental convergence - Wikipedia 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Jump to content 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 From Wikipedia, the free encyclopedia 
 
 
 
 
 
 Hypothesis about intelligent agents 
 Instrumental convergence is the hypothetical tendency of sufficiently intelligent, goal-directed beings (human and nonhuman) to pursue similar sub-goals (such as survival or resource acquisition), even if their ultimate goals are quite different. &#91; 1 &#93; More precisely, beings with agency may pursue similar instrumental goals —goals which are made in pursuit of some particular end, but are not the end goals themselves—because it helps accomplish end goals.

 Instrumental convergence posits that an intelligent agent with seemingly harmless but unbounded goals can act in surprisingly harmful ways. For example, a sufficiently intelligent program with the sole, unconstrained goal of solving a complex mathematics problem like the Riemann hypothesis could attempt to turn the Earth (and in principle other celestial bodies) into additional computing infrastructure to succeed in its calculations. &#91; 2 &#93; 

 Proposed basic AI drives include utility function or goal-content integrity, self-protection, freedom from interference, self-improvement , and non-satiable acquisition of additional resources. &#91; 3 &#93; 

 
 Instrumental and final goals

 [ edit ] 
 Main articles: Instrumental and intrinsic value and Instrumental and value rationality 
 Final goals —also known as terminal goals, absolute values, ends, or telē —are intrinsically valuable to an intelligent agent, whether an artificial intelligence or a human being, as ends-in-themselves . In contrast, instrumental goals, or instrumental values, are only valuable to an agent as a means toward accomplishing its final goals. The contents and tradeoffs of an utterly rational agent's "final goal" system can, in principle, be formalized into a utility function .

 Hypothetical examples

 [ edit ] 
 The Riemann hypothesis catastrophe thought experiment provides one example of instrumental convergence. Marvin Minsky , the co-founder of MIT 's AI laboratory, suggested that an artificial intelligence designed to solve the Riemann hypothesis might decide to take over all of Earth's resources to build supercomputers to help achieve its goal. &#91; 2 &#93; If the computer had instead been programmed to produce as many paperclips as possible, it would still decide to take all of Earth's resources to meet its final goal. &#91; 4 &#93; Even though these two final goals are different, both of them produce a convergent instrumental goal of taking over Earth's resources. &#91; 5 &#93; 

 Paperclip maximizer

 [ edit ] 
 The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom in 2003. It illustrates the existential risk that an artificial general intelligence may pose to human beings were it

... (truncated, 23 KB total)

Resource ID: fe1202750a41eb8c | Stable ID: sid_Cqlr92vbCn