Emergent Abilities

paper

2022·arXiv·arxiv.org/abs/2206.07682

Authors

Jason Wei·Yi Tay·Rishi Bommasani·Colin Raffel·Barret Zoph·Sebastian Borgeaud·Dani Yogatama·Maarten Bosma·Denny Zhou·Donald Metzler·Ed H. Chi·Tatsunori Hashimoto·Oriol Vinyals·Percy Liang·Jeff Dean·William Fedus

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper investigates emergent abilities in large language models—capabilities that unexpectedly appear at certain model scales and cannot be predicted from smaller models. Understanding emergence is crucial for AI safety as it highlights unpredictable behavioral changes during scaling that impact safety considerations and alignment approaches.

Paper Details

Citations

3,367

170 influential

Year

2022

arXiv:2206.07682 DOI:10.48550/arXiv.2206.07682 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.

Summary

This paper introduces the concept of 'emergent abilities' in large language models—capabilities that appear in larger models but are absent in smaller ones, making them unpredictable through simple extrapolation of smaller model performance. Unlike the generally predictable improvements from scaling, emergent abilities represent a discontinuous phenomenon where new capabilities suddenly manifest at certain model scales. The authors argue that this emergence suggests further scaling could unlock additional unforeseen capabilities in language models.

Cited by 5 pages

Page	Type	Quality
Large Language Models	Capability	60.0
Deceptive Alignment Decomposition Model	Analysis	62.0
AI Scaling Laws	Concept	92.0
Emergent Capabilities	Risk	61.0
Sharp Left Turn	Risk	69.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202698 KB

[2206.07682] Emergent Abilities of Large Language Models 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Emergent Abilities of Large Language Models

 
 
 Jason Wei  1 jasonwei@google.com 
 Yi Tay  1 yitay@google.com 
 Rishi Bommasani  2 nlprishi@stanford.edu 
 Colin Raffel  3 craffel@gmail.com 
 Barret Zoph  1 barretzoph@google.com 
 Sebastian Borgeaud  4 sborgeaud@deepmind.com 
 Dani Yogatama  4 dyogatama@deepmind.com 
 Maarten Bosma  1 bosma@google.com 
 Denny Zhou  1 dennyzhou@google.com 
 Donald Metzler  1 metzler@google.com 
 Ed H. Chi  1 edchi@google.com 
 Tatsunori Hashimoto  2 thashim@stanford.edu 
 Oriol Vinyals  4 vinyals@deepmind.com 
 Percy Liang  2 pliang@stanford.edu 
 Jeff Dean  1 jeff@google.com 
 William Fedus  1 liamfedus@google.com 
 1 Google Research   2 Stanford University   3 UNC Chapel Hill   4 DeepMind 
 
 
 

 
 Abstract

 Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.
This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models.
We consider an ability to be emergent if it is not present in smaller models but is present in larger models.
Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.
The existence of such emergence raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.

 
 
 
 1 Introduction

 
 Language models have revolutionized natural language processing (NLP) in recent years.
It is now well-known that increasing the scale of language models (e.g., training compute, model parameters, etc.) can lead to better performance and sample efficiency on a range of downstream NLP tasks (Devlin et al., 2019 ; Brown et al., 2020 , inter alia ) .
In many cases, the effect of scale on performance can often be methodologically predicted via scaling laws—for example, scaling curves for cross-entropy loss have been shown to empirically span more than seven orders of magnitude (Kaplan et al., 2020 ; Hoffmann et al., 2022 ) .
On the other hand, performance for certain downstream tasks counterintuitively does not appear to continuously improve as a function of scale, and such tasks cannot be predicted ahead of time (Ganguli et al., 2022 ) .

 
 
 In this paper, we will discuss the unpredictable phenomena of emergent abilities of large language models. Emergence as an idea has been long discussed in domains such as physics, biology, and computer science (Anderson, 1972 ; Hwang et al., 2012 ; Forrest, 1990 ; Corradini & O’Connor, 2010 ; Harper & Lewis, 2012 , inter alia ) .
We will consider the following general definition of emergence, adapted from Steinhardt ( 2022 ) and rooted in a 1972 essay called “More Is Different” by Nobel prize-winning physicist Philip Anderson (Anderson, 1972 ) :

 
 Emergence is when quantitative changes in a system result in qualitative chan

... (truncated, 98 KB total)

Resource ID: 2d76bc16fcc7825d | Stable ID: sid_822NnQSJnl