Skip to content
Longterm Wiki
Back

Emergent Abilities

paper

Authors

Jason Wei·Yi Tay·Rishi Bommasani·Colin Raffel·Barret Zoph·Sebastian Borgeaud·Dani Yogatama·Maarten Bosma·Denny Zhou·Donald Metzler·Ed H. Chi·Tatsunori Hashimoto·Oriol Vinyals·Percy Liang·Jeff Dean·William Fedus

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper investigates emergent abilities in large language models—capabilities that unexpectedly appear at certain model scales and cannot be predicted from smaller models. Understanding emergence is crucial for AI safety as it highlights unpredictable behavioral changes during scaling that impact safety considerations and alignment approaches.

Paper Details

Citations
3,367
170 influential
Year
2022

Metadata

arxiv preprintprimary source

Abstract

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.

Summary

This paper introduces the concept of 'emergent abilities' in large language models—capabilities that appear in larger models but are absent in smaller ones, making them unpredictable through simple extrapolation of smaller model performance. Unlike the generally predictable improvements from scaling, emergent abilities represent a discontinuous phenomenon where new capabilities suddenly manifest at certain model scales. The authors argue that this emergence suggests further scaling could unlock additional unforeseen capabilities in language models.

Cited by 5 pages

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Emergent Abilities of Large Language Models

Jason Wei 1jasonwei@google.com

Yi Tay 1yitay@google.com

Rishi Bommasani 2nlprishi@stanford.edu

Colin Raffel 3craffel@gmail.com

Barret Zoph 1barretzoph@google.com

Sebastian Borgeaud 4sborgeaud@deepmind.com

Dani Yogatama 4dyogatama@deepmind.com

Maarten Bosma 1bosma@google.com

Denny Zhou 1dennyzhou@google.com

Donald Metzler 1metzler@google.com

Ed H. Chi 1edchi@google.com

Tatsunori Hashimoto 2thashim@stanford.edu

Oriol Vinyals 4vinyals@deepmind.com

Percy Liang 2pliang@stanford.edu

Jeff Dean 1jeff@google.com

William Fedus 1liamfedus@google.com

1Google Research   2Stanford University   3UNC Chapel Hill   4DeepMind

###### Abstract

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.
This paper instead discusses an unpredictable phenomenon that we refer to as _emergent abilities_ of large language models.
We consider an ability to be emergent if it is not present in smaller models but is present in larger models.
Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.
The existence of such emergence raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.

## 1 Introduction

Language models have revolutionized natural language processing (NLP) in recent years.
It is now well-known that increasing the scale of language models (e.g., training compute, model parameters, etc.) can lead to better performance and sample efficiency on a range of downstream NLP tasks (Devlin et al., [2019](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib23 ""); Brown et al., [2020](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib13 ""), inter alia).
In many cases, the effect of scale on performance can often be methodologically predicted via scaling laws—for example, scaling curves for cross-entropy loss have been shown to empirically span more than seven orders of magnitude (Kaplan et al., [2020](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib44 ""); Hoffmann et al., [2022](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib35 "")).
On the other hand, performance for certain downstream tasks counterintuitively does not appear to continuously improve as a function of scale, and such tasks cannot be predicted ahead of time (Ganguli et al., [2022](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib27 "")).

In this paper, we will discuss the unpredictable phenomena of emergent abilities of large language models. Emergence as an idea has been long discussed in domains such as physics, biology, and computer science (Anderson, [1972](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib4 ""); Hwang et al., [2012](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib39 ""); Forrest, [1990](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib26 ""); Corradini & O’Connor, [2010](https://ar5iv.labs.arxiv.org/htm

... (truncated, 98 KB total)
Resource ID: 2d76bc16fcc7825d | Stable ID: YzEzZmNiNz