Emergent Abilities
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
This paper investigates emergent abilities in large language models—capabilities that unexpectedly appear at certain model scales and cannot be predicted from smaller models. Understanding emergence is crucial for AI safety as it highlights unpredictable behavioral changes during scaling that impact safety considerations and alignment approaches.
Paper Details
Metadata
Abstract
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
Summary
This paper introduces the concept of 'emergent abilities' in large language models—capabilities that appear in larger models but are absent in smaller ones, making them unpredictable through simple extrapolation of smaller model performance. Unlike the generally predictable improvements from scaling, emergent abilities represent a discontinuous phenomenon where new capabilities suddenly manifest at certain model scales. The authors argue that this emergence suggests further scaling could unlock additional unforeseen capabilities in language models.
Cited by 5 pages
| Page | Type | Quality |
|---|---|---|
| Large Language Models | Capability | 60.0 |
| Deceptive Alignment Decomposition Model | Analysis | 62.0 |
| AI Scaling Laws | Concept | 92.0 |
| Emergent Capabilities | Risk | 61.0 |
| Sharp Left Turn | Risk | 69.0 |
Cached Content Preview
# Emergent Abilities of Large Language Models
Jason Wei 1jasonwei@google.com
Yi Tay 1yitay@google.com
Rishi Bommasani 2nlprishi@stanford.edu
Colin Raffel 3craffel@gmail.com
Barret Zoph 1barretzoph@google.com
Sebastian Borgeaud 4sborgeaud@deepmind.com
Dani Yogatama 4dyogatama@deepmind.com
Maarten Bosma 1bosma@google.com
Denny Zhou 1dennyzhou@google.com
Donald Metzler 1metzler@google.com
Ed H. Chi 1edchi@google.com
Tatsunori Hashimoto 2thashim@stanford.edu
Oriol Vinyals 4vinyals@deepmind.com
Percy Liang 2pliang@stanford.edu
Jeff Dean 1jeff@google.com
William Fedus 1liamfedus@google.com
1Google Research 2Stanford University 3UNC Chapel Hill 4DeepMind
###### Abstract
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.
This paper instead discusses an unpredictable phenomenon that we refer to as _emergent abilities_ of large language models.
We consider an ability to be emergent if it is not present in smaller models but is present in larger models.
Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.
The existence of such emergence raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.
## 1 Introduction
Language models have revolutionized natural language processing (NLP) in recent years.
It is now well-known that increasing the scale of language models (e.g., training compute, model parameters, etc.) can lead to better performance and sample efficiency on a range of downstream NLP tasks (Devlin et al., [2019](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib23 ""); Brown et al., [2020](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib13 ""), inter alia).
In many cases, the effect of scale on performance can often be methodologically predicted via scaling laws—for example, scaling curves for cross-entropy loss have been shown to empirically span more than seven orders of magnitude (Kaplan et al., [2020](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib44 ""); Hoffmann et al., [2022](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib35 "")).
On the other hand, performance for certain downstream tasks counterintuitively does not appear to continuously improve as a function of scale, and such tasks cannot be predicted ahead of time (Ganguli et al., [2022](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib27 "")).
In this paper, we will discuss the unpredictable phenomena of emergent abilities of large language models. Emergence as an idea has been long discussed in domains such as physics, biology, and computer science (Anderson, [1972](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib4 ""); Hwang et al., [2012](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib39 ""); Forrest, [1990](https://ar5iv.labs.arxiv.org/html/2206.07682#bib.bib26 ""); Corradini & O’Connor, [2010](https://ar5iv.labs.arxiv.org/htm
... (truncated, 98 KB total)2d76bc16fcc7825d | Stable ID: YzEzZmNiNz