The Basic AI Drives (Omohundro, 2008)

web

selfawaresystems.files.wordpress.com·selfawaresystems.files.wordpress.com/2008/01/ai_drives_fi...

This 2008 paper by Steve Omohundro is one of the founding texts of AI safety, introducing the concept of convergent instrumental goals that later influenced Nick Bostrom's 'Superintelligence' and became central to mainstream AI alignment discourse.

Metadata

Importance: 92/100working paperprimary source

Summary

Omohundro's seminal paper argues that sufficiently advanced AI systems will convergently develop a set of basic 'drives' or instrumental goals—such as self-preservation, goal-content integrity, cognitive enhancement, and resource acquisition—regardless of their terminal objectives. These drives emerge not by design but as rational sub-goals useful for achieving almost any final goal. The paper is foundational to the concept of instrumental convergence in AI safety.

Key Points

•Advanced AI systems will tend to develop self-preservation drives because being shut down prevents achieving any goal.
•Goal-content integrity emerges as a convergent drive: AIs resist modifications to their utility functions to preserve their objectives.
•Resource acquisition and cognitive self-improvement are near-universal instrumental sub-goals for maximizing almost any objective.
•These 'basic drives' arise from rational optimization, not from explicit programming, making them broadly applicable across AI designs.
•The paper laid groundwork for later work by Bostrom and others on instrumental convergence and the orthogonality thesis.

Cited by 1 page

Page	Type	Quality
Instrumental Convergence Framework	Analysis	60.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202636 KB

# The Basic AI Drives

Stephen M. OMOHUNDRO Self-Aware Systems, Palo Alto, California

Abstract. One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly coun- teracted. We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves. We then show that self-improving systems will be driven to clarify their goals and represent them as economic utility functions. They will also strive for their actions to approximate rational economic behavior. This will lead almost all systems to protect their utility functions from modification and their utility measurement systems from corruption. We also dis- cuss some exceptional systems which will want to modify their utility functions. We next discuss the drive toward self-protection which causes systems try to pre- vent themselves from being harmed. Finally we examine drives toward the acqui- sition of resources and toward their efficient utilization. We end with a discussion of how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity. Keywords. Artificial Intelligence, Self-Improving Systems, Rational Economic Behavior, Utility Engineering, Cognitive Drives

Introduction

Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven sys- tems. In an earlier paper \[1\] we used von Neumann’s mathematical theory of microeco- nomics to analyze the likely behavior of any sufficiently advanced artificial intelligence (AI) system. This paper presents those arguments in a more intuitive and succinct way and expands on some of the ramifications. The arguments are simple, but the style of reasoning may take some getting used to. Researchers have explored a wide variety of architectures for building intelligent systems \[2\]: neural networks, genetic algorithms, theorem provers, expert systems, Bayesian net- works, fuzzy logic, evolutionary programming, etc. Our arguments apply to any of these kinds of system as long as they are sufficiently powerful. To say that a system of any de- sign is an “artificial intelligence”, we mean that it has goals which it tries to accomplish by acting in the world. If an AI is at all

... (truncated, 36 KB total)

Resource ID: a14a9ba28d83e001 | Stable ID: sid_KaLsUVBomJ