"Are Emergent Abilities a Mirage?"
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Highly influential NeurIPS 2023 paper that directly challenges the 'emergent abilities' narrative central to many AI risk and forecasting arguments, suggesting unpredictable capability jumps may be a measurement artifact rather than a real scaling phenomenon.
Paper Details
Metadata
Abstract
Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance. We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities; (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. Via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.
Summary
This paper argues that apparent emergent abilities in large language models are artifacts of metric choice rather than genuine phase transitions in model behavior. Using mathematical modeling and empirical analysis across GPT-3, BIG-Bench, and vision models, the authors show that nonlinear metrics create illusory sharp transitions while linear metrics reveal smooth, predictable scaling. The findings suggest emergent abilities may not be a fundamental property of AI scaling.
Key Points
- •Apparent emergent abilities arise from nonlinear or discontinuous evaluation metrics, not from fundamental changes in model behavior at scale.
- •Switching to linear or continuous metrics reveals smooth, predictable performance improvements across model sizes, eliminating apparent phase transitions.
- •Authors demonstrate the effect empirically on InstructGPT/GPT-3 family, BIG-Bench tasks, and vision models across diverse architectures.
- •The paper shows researchers can artificially induce 'never-before-seen' emergent abilities in vision tasks simply by choosing appropriate metrics.
- •Challenges a widely cited property of LLM scaling with significant implications for AI forecasting, risk modeling, and capability evaluation.
Cited by 4 pages
| Page | Type | Quality |
|---|---|---|
| AI Accident Risk Cruxes | Crux | 67.0 |
| AI Scaling Laws | Concept | 92.0 |
| Emergent Capabilities | Risk | 61.0 |
| Sharp Left Turn | Risk | 69.0 |
Cached Content Preview
# Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Sanmi Koyejo
###### Abstract
Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models.
What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales.
Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due the researcher’s choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous, predictable changes in model performance.
We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks.
Via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.
## 1 Introduction
Emergent properties of complex systems have long been studied across disciplines, from physics to biology to mathematics.
The idea of emergence was popularized by Nobel Prize-winning physicist P.W. Anderson’s “More Is Different” ( [anderson1972more,](https://ar5iv.labs.arxiv.org/html/2304.15004#bib.bib1 "")), which argues that as the complexity of a system increases, new properties may materialize that cannot be predicted even from a precise quantitative understanding of the system’s microscopic details. Recently, the idea of emergence gained significant attention in machine learning due to observations that large language models (LLMs) such as GPT ( [brown2020language,](https://ar5iv.labs.arxiv.org/html/2304.15004#bib.bib3 "")), PaLM ( [chowdhery2022palm,](https://ar5iv.labs.arxiv.org/html/2304.15004#bib.bib6 "")) and LaMDA [thoppilan2022lamda](https://ar5iv.labs.arxiv.org/html/2304.15004#bib.bib30 "") exhibit so-called “emergent abilities” ( [wei2022emergent,](https://ar5iv.labs.arxiv.org/html/2304.15004#bib.bib33 ""); [ganguli2022predictability,](https://ar5iv.labs.arxiv.org/html/2304.15004#bib.bib8 ""); [srivastava2022beyond,](https://ar5iv.labs.arxiv.org/html/2304.15004#bib.bib28 ""); [brown2020language,](https://ar5iv.labs.arxiv.org/html/2304.1
... (truncated, 62 KB total)22db72cf2a806d3b | Stable ID: NDkxNjIwY2