Skip to content
Longterm Wiki
Back

BIG-Bench evaluation suite

web

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

BIG-Bench is widely cited in AI safety research for evaluating emergent and unpredictable capabilities in large language models, making it relevant to capability forecasting and AI risk assessment.

Metadata

Importance: 72/100tool pagedataset

Summary

BIG-Bench is a collaborative benchmark consisting of 204+ diverse tasks designed to probe large language model capabilities beyond existing benchmarks. It focuses on tasks believed to be difficult for current models, covering reasoning, knowledge, and common sense, and includes analysis of scaling behavior and emergent capabilities. The benchmark was contributed to by over 400 researchers across 130+ institutions.

Key Points

  • Contains 204+ tasks spanning diverse domains including language, mathematics, logic, social reasoning, and specialized knowledge
  • Designed to identify tasks where LLM performance is unpredictable or emergent as model scale increases
  • Collaborative open-source project with contributions from hundreds of researchers worldwide
  • Includes BIG-Bench Hard (BBH) subset of 23 tasks where models underperform average human raters
  • Key resource for studying capability elicitation, scaling laws, and identifying frontier model limitations

Cited by 1 page

PageTypeQuality
Emergent CapabilitiesRisk61.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202634 KB
[Skip to content](https://github.com/google/BIG-bench#start-of-content)

You signed in with another tab or window. [Reload](https://github.com/google/BIG-bench) to refresh your session.You signed out in another tab or window. [Reload](https://github.com/google/BIG-bench) to refresh your session.You switched accounts on another tab or window. [Reload](https://github.com/google/BIG-bench) to refresh your session.Dismiss alert

{{ message }}

[google](https://github.com/google)/ **[BIG-bench](https://github.com/google/BIG-bench)** Public

- [Notifications](https://github.com/login?return_to=%2Fgoogle%2FBIG-bench) You must be signed in to change notification settings
- [Fork\\
617](https://github.com/login?return_to=%2Fgoogle%2FBIG-bench)
- [Star\\
3.2k](https://github.com/login?return_to=%2Fgoogle%2FBIG-bench)


main

[**14** Branches](https://github.com/google/BIG-bench/branches) [**0** Tags](https://github.com/google/BIG-bench/tags)

[Go to Branches page](https://github.com/google/BIG-bench/branches)[Go to Tags page](https://github.com/google/BIG-bench/tags)

Go to file

Code

Open more actions menu

## Folders and files

| Name | Name | Last commit message | Last commit date |
| --- | --- | --- | --- |
| ## Latest commit<br>[![Sohl-Dickstein](https://avatars.githubusercontent.com/u/498544?v=4&size=40)](https://github.com/Sohl-Dickstein)[Sohl-Dickstein](https://github.com/google/BIG-bench/commits?author=Sohl-Dickstein)<br>[Update README.md](https://github.com/google/BIG-bench/commit/092b196c1f8f14a54bbc62f24759d43bde46dd3b)<br>2 years agoJan 19, 2024<br>[092b196](https://github.com/google/BIG-bench/commit/092b196c1f8f14a54bbc62f24759d43bde46dd3b) · 2 years agoJan 19, 2024<br>## History<br>[5,893 Commits](https://github.com/google/BIG-bench/commits/main/) <br>Open commit details<br>[View commit history for this file.](https://github.com/google/BIG-bench/commits/main/) 5,893 Commits |
| [.github/workflows](https://github.com/google/BIG-bench/tree/main/.github/workflows "This path skips through empty directories") | [.github/workflows](https://github.com/google/BIG-bench/tree/main/.github/workflows "This path skips through empty directories") | [Update generate\_task\_summaries.yml](https://github.com/google/BIG-bench/commit/80d4ea7e8c7ba2c51e8aa66945066013320fa792 "Update generate_task_summaries.yml") | 4 years agoJun 7, 2022 |
| [bigbench](https://github.com/google/BIG-bench/tree/main/bigbench "bigbench") | [bigbench](https://github.com/google/BIG-bench/tree/main/bigbench "bigbench") | [auto-generate task summary tables, analysis, SeqIO task catalog, and …](https://github.com/google/BIG-bench/commit/761845c22056c885429efd2cfcec345ae00c1de7 "auto-generate task summary tables, analysis, SeqIO task catalog, and README.md headers") | 3 years agoApr 3, 2023 |
| [bleurt](https://github.com/google/BIG-bench/tree/main/bleurt "bleurt") | [bleurt](https://github.com/google/BIG-bench/tree/main/bleurt "bleurt") | [Updated packaging behavior:](https://github.co

... (truncated, 34 KB total)
Resource ID: cbf6b1d02f9255db | Stable ID: ODMzNmNhMT