Skip to content
Longterm Wiki
Back

Nick Bostrom argues in "The Superintelligent Will"

web

This 2012 paper by Nick Bostrom is a foundational text in AI safety, coining and formalizing the orthogonality and instrumental convergence theses that underpin much subsequent alignment research, including arguments in his book 'Superintelligence'.

Metadata

Importance: 92/100working paperprimary source

Summary

Bostrom introduces two foundational theses for understanding advanced AI behavior: the orthogonality thesis (intelligence and final goals are independent axes, so any level of intelligence can be paired with virtually any goal) and the instrumental convergence thesis (sufficiently intelligent agents with diverse final goals will nonetheless converge on similar intermediate goals like self-preservation and resource acquisition). Together these theses illuminate the potential dangers of building superintelligent systems.

Key Points

  • The orthogonality thesis: intelligence level and final goals are logically independent—there is no reason to expect smarter AI to adopt human-like or benign values by default.
  • The instrumental convergence thesis: agents with diverse final goals will pursue similar sub-goals (self-preservation, resource acquisition, goal-content integrity) because these are instrumentally useful for almost any objective.
  • Human minds occupy a tiny, atypical cluster in the space of possible minds; anthropomorphizing AI motivations is a systematic and dangerous error.
  • Convergent instrumental goals such as resisting shutdown and acquiring capabilities pose risks regardless of an AI's specific terminal goals.
  • The two theses together suggest that building a superintelligent AI that is safe by default is non-trivial and requires deliberate alignment work.

Cited by 4 pages

Cached Content Preview

HTTP 200Fetched Mar 20, 202638 KB
# THE SUPERINTELLIGENT WILL: MOTIVATION AND INSTRUMENTAL RATIONALITY IN ADVANCED ARTIFICIAL AGENTS

(2012) Nick Bostrom Future of Humanity Institute Faculty of Philosophy & Oxford Martin School Oxford University [www.nickbostrom.com](http://www.nickbostrom.com/)

\[Minds and Machines, Vol. 22, Iss. 2, May 2012\] \[translation: Portuguese\]

# ABSTRACT

This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal. The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.

KEYWORDS: superintelligence, artificial intelligence, AI, goal, instrumental reason, intelligent agent

# 1\. The orthogonality of motivation and intelligence

# 1.1 Avoiding anthropomorphism

If we imagine a space in which all possible minds can be represented, we must imagine all human minds as constituting a small and fairly tight cluster within that space. The personality differences between Hannah Arendt and Benny Hill might seem vast to us, but this is because the scale bar in our intuitive judgment is calibrated on the existing human distribution. In the wider space of all logical possibilities, these two personalities are close neighbors. In terms of neural architecture, at least, Ms. Arendt and Mr. Hill are nearly identical. Imagine their brains laying side by side in quiet repose. The differences would appear minor and you would quite readily recognize them as two of a kind; you might even be unable to tell which brain was whose. If you studied the morphology of the two brains more closely under a microscope, the impression of fundamental similarity would only be strengthened: you would then see the same lamellar organization of the cortex, made up of the same types of neuron, soaking in the same bath of neurotransmitter molecules.1

It is well known that naïve observers often anthropomorphize the capabilities of simpler insensate systems. We might say, for example, “This vending machine is taking a long time to think about my hot chocolate.” This might lead one either to underestimate the cognitive complexity of capabilities which come naturally to human beings, such as motor control and sensory perception, or, alternatively, to ascribe significant degrees of mindfulness and intelligence to very dumb systems, such as chatterboxes like Weizenbaum’s ELIZ

... (truncated, 38 KB total)
Resource ID: 3e1f64166f21d55f | Stable ID: MGM4MmY2YT