Skip to content
Longterm Wiki
Back

Jason Wei of Google Brain

web

Relevant to AI safety discussions about whether dangerous capabilities can emerge suddenly and without warning; the measurement-artifact hypothesis suggests better evaluation design could improve foresight into capability development.

Metadata

Importance: 62/100news articlenews

Summary

A Quanta Magazine article covering a Stanford study arguing that so-called 'emergent' abilities in large language models are not sudden or unpredictable, but appear so due to measurement choices. When different metrics are used, the abilities develop gradually and smoothly with scale, suggesting the 'phase transition' framing may be a measurement artifact rather than a genuine phenomenon.

Key Points

  • The BIG-bench benchmark found some LLM abilities appeared to jump abruptly with scale rather than improving smoothly, leading to claims of 'emergent' behavior.
  • A Stanford trio argues sudden emergence is an artifact of coarse metrics (e.g., exact match accuracy) that only register success at a late threshold.
  • Using continuous or finer-grained metrics reveals gradual, predictable improvement, undermining the phase-transition analogy.
  • The debate has direct AI safety implications: if emergence is unpredictable, dangerous capabilities could appear without warning; if predictable, they can be anticipated.
  • The article interviews Jason Wei (Google Brain/DeepMind), one of the original emergence paper authors, who acknowledges the measurement critique but defends the phenomenon's relevance.

Cited by 1 page

PageTypeQuality
Emergent CapabilitiesRisk61.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202613 KB
![](https://cdn-cookieyes.com/assets/images/close.svg)

We value your privacy

We use cookies to provide you with the best online experience. We do not sell data or use it for ad personalization. You can decline our cookies by clicking on the button here. Visit our [Privacy Policy](https://www.quantamagazine.org/privacy-policy/) to learn more.

Do Not Share My Personal Information

Opt-out Preferences![](https://cdn-cookieyes.com/assets/images/close.svg)

We use third-party cookies that help us analyze how you use this website, store your preferences, and provide content that is relevant to you. We do not sell data or use it for ad personalization. However, you can opt out of these cookies by checking the box below and clicking the "Save My Preferences" button. Once you opt out, you can change your choice at any time by visiting our [Privacy Policy](https://www.quantamagazine.org/privacy-policy/) and clicking the "Manage My Preferences" button.

Do Not Share My Personal Information

CancelSave My Preferences

Loading \[MathJax\]/extensions/tex2jax.js

[Home](https://www.quantamagazine.org/)

How Quickly Do Large Language Models Learn Unexpected Skills?

[Comment\\
11](https://www.quantamagazine.org/how-quickly-do-large-language-models-learn-unexpected-skills-20240213/#comments)

Save Article

Read Later

###### Share

- [Comment\\
11\\
\\
Comments](https://www.quantamagazine.org/how-quickly-do-large-language-models-learn-unexpected-skills-20240213/#comments)

- Save Article

Read Later













Read Later


[artificial intelligence](https://www.quantamagazine.org/tag/artificial-intelligence/)

# How Quickly Do Large Language Models Learn Unexpected Skills?

_By_ [Stephen Ornes](https://www.quantamagazine.org/authors/stephen-ornes/)

_February 13, 2024_

A new study suggests that so-called emergent abilities actually develop gradually and predictably, depending on how you measure them.

[Comment\\
11](https://www.quantamagazine.org/how-quickly-do-large-language-models-learn-unexpected-skills-20240213/#comments)

Save Article

Read Later

![](https://www.quantamagazine.org/wp-content/uploads/2024/02/AbilitiesJumpMirage-byKristinaArmitage-Lede-scaled.webp)

Kristina Armitage/ _Quanta Magazine_

## Introduction

Two years ago, in a project called the [Beyond the Imitation Game benchmark (opens a new tab)](https://arxiv.org/abs/2206.04615), or BIG-bench, 450 researchers compiled a list of 204 tasks designed to test the capabilities of large language models, which power chatbots like ChatGPT. On most tasks, performance improved predictably and smoothly as the models scaled up — the larger the model, the better it got. But with other tasks, the jump in ability wasn’t smooth. The performance remained near zero for a while, then performance jumped. Other studies found similar leaps in ability.

The authors described this as “breakthrough” behavior; other researchers have likened it to a phase transition in physics, like when liquid water freezes into 

... (truncated, 13 KB total)
Resource ID: 38328f97c152d10f | Stable ID: MmI4ZjA3Yj