Back
Jason Wei of Google Brain
webquantamagazine.org·quantamagazine.org/how-quickly-do-large-language-models-l...
Relevant to AI safety discussions about whether dangerous capabilities can emerge suddenly and without warning; the measurement-artifact hypothesis suggests better evaluation design could improve foresight into capability development.
Metadata
Importance: 62/100news articlenews
Summary
A Quanta Magazine article covering a Stanford study arguing that so-called 'emergent' abilities in large language models are not sudden or unpredictable, but appear so due to measurement choices. When different metrics are used, the abilities develop gradually and smoothly with scale, suggesting the 'phase transition' framing may be a measurement artifact rather than a genuine phenomenon.
Key Points
- •The BIG-bench benchmark found some LLM abilities appeared to jump abruptly with scale rather than improving smoothly, leading to claims of 'emergent' behavior.
- •A Stanford trio argues sudden emergence is an artifact of coarse metrics (e.g., exact match accuracy) that only register success at a late threshold.
- •Using continuous or finer-grained metrics reveals gradual, predictable improvement, undermining the phase-transition analogy.
- •The debate has direct AI safety implications: if emergence is unpredictable, dangerous capabilities could appear without warning; if predictable, they can be anticipated.
- •The article interviews Jason Wei (Google Brain/DeepMind), one of the original emergence paper authors, who acknowledges the measurement critique but defends the phenomenon's relevance.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Emergent Capabilities | Risk | 61.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202613 KB

We value your privacy
We use cookies to provide you with the best online experience. We do not sell data or use it for ad personalization. You can decline our cookies by clicking on the button here. Visit our [Privacy Policy](https://www.quantamagazine.org/privacy-policy/) to learn more.
Do Not Share My Personal Information
Opt-out Preferences
We use third-party cookies that help us analyze how you use this website, store your preferences, and provide content that is relevant to you. We do not sell data or use it for ad personalization. However, you can opt out of these cookies by checking the box below and clicking the "Save My Preferences" button. Once you opt out, you can change your choice at any time by visiting our [Privacy Policy](https://www.quantamagazine.org/privacy-policy/) and clicking the "Manage My Preferences" button.
Do Not Share My Personal Information
CancelSave My Preferences
Loading \[MathJax\]/extensions/tex2jax.js
[Home](https://www.quantamagazine.org/)
How Quickly Do Large Language Models Learn Unexpected Skills?
[Comment\\
11](https://www.quantamagazine.org/how-quickly-do-large-language-models-learn-unexpected-skills-20240213/#comments)
Save Article
Read Later
###### Share
- [Comment\\
11\\
\\
Comments](https://www.quantamagazine.org/how-quickly-do-large-language-models-learn-unexpected-skills-20240213/#comments)
- Save Article
Read Later
Read Later
[artificial intelligence](https://www.quantamagazine.org/tag/artificial-intelligence/)
# How Quickly Do Large Language Models Learn Unexpected Skills?
_By_ [Stephen Ornes](https://www.quantamagazine.org/authors/stephen-ornes/)
_February 13, 2024_
A new study suggests that so-called emergent abilities actually develop gradually and predictably, depending on how you measure them.
[Comment\\
11](https://www.quantamagazine.org/how-quickly-do-large-language-models-learn-unexpected-skills-20240213/#comments)
Save Article
Read Later

Kristina Armitage/ _Quanta Magazine_
## Introduction
Two years ago, in a project called the [Beyond the Imitation Game benchmark (opens a new tab)](https://arxiv.org/abs/2206.04615), or BIG-bench, 450 researchers compiled a list of 204 tasks designed to test the capabilities of large language models, which power chatbots like ChatGPT. On most tasks, performance improved predictably and smoothly as the models scaled up — the larger the model, the better it got. But with other tasks, the jump in ability wasn’t smooth. The performance remained near zero for a while, then performance jumped. Other studies found similar leaps in ability.
The authors described this as “breakthrough” behavior; other researchers have likened it to a phase transition in physics, like when liquid water freezes into
... (truncated, 13 KB total)Resource ID:
38328f97c152d10f | Stable ID: MmI4ZjA3Yj