Skip to content
Longterm Wiki
Back

LessWrong (2024). "Instrumental Convergence Wiki"

blog

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: LessWrong

A foundational concept in AI safety originally articulated by Stuart Armstrong and Nick Bostrom; this wiki entry serves as an accessible reference for understanding why advanced AI systems may pursue dangerous sub-goals regardless of their primary objectives.

Metadata

Importance: 78/100wiki pagereference

Summary

This wiki entry explains instrumental convergence: the principle that AI systems with diverse final goals may converge on similar intermediate strategies such as resource acquisition, self-preservation, and goal preservation. The alien superconducting cable analogy illustrates how we can infer goal-directed behavior without knowing the ultimate objective. This concept is foundational to understanding why misaligned AI systems could be dangerous regardless of their specific programmed goals.

Key Points

  • Instrumental convergence holds that many different final goals lead to the same intermediate 'instrumental' goals, such as acquiring resources and avoiding shutdown.
  • The alien superconducting cable analogy shows that efficient means (like energy transport) are useful across nearly any goal, enabling inference about intelligent design.
  • Key convergent instrumental goals include self-preservation, goal-content integrity, cognitive enhancement, and resource/capability acquisition.
  • This principle underpins AI safety concerns: even AI with benign-seeming goals could exhibit dangerous behaviors if those behaviors are instrumentally useful.
  • Instrumental convergence is closely related to Bostrom's 'basic AI drives' and Turner et al.'s formal work on power-seeking behavior in RL agents.

Cited by 1 page

PageTypeQuality
Instrumental ConvergenceRisk64.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202635 KB
x This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. Instrumental convergence — LessWrong Main 5 LW Wiki 1 Instrumental convergence

 Edited by Eliezer Yudkowsky , et al. last updated 19th Feb 2025 Alternative introductions 

 Steve Omohundro: " The Basic AI Drives " 
 Nick Bostrom: " The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents ". 
 Introduction: A machine of unknown purpose 

 Suppose you landed on a distant planet and found a structure of giant metal pipes, crossed by occasional cables. Further investigation shows that the cables are electrical superconductors carrying high-voltage currents. 

 You might not know what the huge structure did. But you would nonetheless guess that this huge structure had been built by some intelligence, rather than being a naturally-occurring mineral formation - that there were aliens who built the structure for some purpose. 

 Your reasoning might go something like this: "Well, I don't know if the aliens were trying to manufacture cars, or build computers, or what. But if you consider the problem of efficient manufacturing, it might involve mining resources in one place and then efficiently transporting them somewhere else, like by pipes. Since the most efficient size and location of these pipes would be stable, you'd want the shape of the pipes to be stable, which you could do by making the pipes out of a hard material like metal. There's all sorts of operations that require energy or negentropy, and a superconducting cable carrying electricity seems like an efficient way of transporting that energy. So I don't know what the aliens were ultimately trying to do, but across a very wide range of possible goals, an intelligent alien might want to build a superconducting cable to pursue that goal." 

 That is: We can take an enormous variety of compactly specifiable goals, like "travel to the other side of the universe" or "support biological life" or "make paperclips", and find very similar optimal strategies along the way. Today we don't actually know if electrical superconductors are the most useful way to transport energy in the limit of technology. But whatever is the most efficient way of transporting energy, whether that's electrical superconductors or something else, the most efficient form of that technology would probably not vary much depending on whether you were trying to make diamonds or make paperclips. 

 Or to put it another way: If you consider the goals "make diamonds" and "make paperclips", then they might have almost nothing in common with respect to their end-states - a diamond might contain no iron. But the earlier strategies used to make a lot of diamond and make a lot of paperclips might have much in common; "the best way of transporting energy to make diamond" and "the best way of transporting energy to make paperclips" are much more 

... (truncated, 35 KB total)
Resource ID: 90e9322ba84baa7a | Stable ID: YjM2YzFkNW