"Are Emergent Abilities a Mirage?"

paper

2023·arXiv·arxiv.org/abs/2304.15004

Authors

Rylan Schaeffer·Brando Miranda·Sanmi Koyejo

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Highly influential NeurIPS 2023 paper that directly challenges the 'emergent abilities' narrative central to many AI risk and forecasting arguments, suggesting unpredictable capability jumps may be a measurement artifact rather than a real scaling phenomenon.

Paper Details

Citations

22 influential

Year

2023

Methodology

peer-reviewed

Metadata

Importance: 82/100arxiv preprintprimary source

Abstract

Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due to the researcher's choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance. We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities; (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. Via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.

Summary

This paper argues that apparent emergent abilities in large language models are artifacts of metric choice rather than genuine phase transitions in model behavior. Using mathematical modeling and empirical analysis across GPT-3, BIG-Bench, and vision models, the authors show that nonlinear metrics create illusory sharp transitions while linear metrics reveal smooth, predictable scaling. The findings suggest emergent abilities may not be a fundamental property of AI scaling.

Key Points

•Apparent emergent abilities arise from nonlinear or discontinuous evaluation metrics, not from fundamental changes in model behavior at scale.
•Switching to linear or continuous metrics reveals smooth, predictable performance improvements across model sizes, eliminating apparent phase transitions.
•Authors demonstrate the effect empirically on InstructGPT/GPT-3 family, BIG-Bench tasks, and vision models across diverse architectures.
•The paper shows researchers can artificially induce 'never-before-seen' emergent abilities in vision tasks simply by choosing appropriate metrics.
•Challenges a widely cited property of LLM scaling with significant implications for AI forecasting, risk modeling, and capability evaluation.

Cited by 4 pages

Page	Type	Quality
AI Accident Risk Cruxes	Crux	67.0
AI Scaling Laws	Concept	92.0
Emergent Capabilities	Risk	61.0
Sharp Left Turn	Risk	69.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202649 KB

[2304.15004] Are Emergent Abilities of Large Language Models a Mirage? 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Are Emergent Abilities of Large Language Models a Mirage?

 
 
 Rylan Schaeffer
 
 
 
 
 Brando Miranda
 
 
 
 
 Sanmi Koyejo
 
 
 
 

 
 Abstract

 Recent work claims that large language models display emergent abilities , abilities not present in smaller-scale models that are present in larger-scale models.
What makes emergent abilities intriguing is two-fold: their sharpness , transitioning seemingly instantaneously from not present to present, and their unpredictability , appearing at seemingly unforeseeable model scales.
Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due the researcher’s choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous, predictable changes in model performance.
We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks.
Via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.

 
 
 
 1 Introduction

 
 Emergent properties of complex systems have long been studied across disciplines, from physics to biology to mathematics.
The idea of emergence was popularized by Nobel Prize-winning physicist P.W. Anderson’s “More Is Different” ( anderson1972more, ) , which argues that as the complexity of a system increases, new properties may materialize that cannot be predicted even from a precise quantitative understanding of the system’s microscopic details. Recently, the idea of emergence gained significant attention in machine learning due to observations that large language models (LLMs) such as GPT ( brown2020language, ) , PaLM ( chowdhery2022palm, ) and LaMDA thoppilan2022lamda exhibit so-called “emergent abilities” ( wei2022emergent, ; ganguli2022predictability, ; srivastava2022beyond, ; brown2020language, ) (Fig. 1 ).

 
 
 The term “emergent abilities of LLMs” was recently and crisply defined as “abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models” wei2022emerg

... (truncated, 49 KB total)

Resource ID: 22db72cf2a806d3b | Stable ID: sid_z97PU6e3D9