BIG-Bench evaluation suite
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: GitHub
BIG-Bench is widely cited in AI safety research for evaluating emergent and unpredictable capabilities in large language models, making it relevant to capability forecasting and AI risk assessment.
Metadata
Summary
BIG-Bench is a collaborative benchmark consisting of 204+ diverse tasks designed to probe large language model capabilities beyond existing benchmarks. It focuses on tasks believed to be difficult for current models, covering reasoning, knowledge, and common sense, and includes analysis of scaling behavior and emergent capabilities. The benchmark was contributed to by over 400 researchers across 130+ institutions.
Key Points
- •Contains 204+ tasks spanning diverse domains including language, mathematics, logic, social reasoning, and specialized knowledge
- •Designed to identify tasks where LLM performance is unpredictable or emergent as model scale increases
- •Collaborative open-source project with contributions from hundreds of researchers worldwide
- •Includes BIG-Bench Hard (BBH) subset of 23 tasks where models underperform average human raters
- •Key resource for studying capability elicitation, scaling laws, and identifying frontier model limitations
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Emergent Capabilities | Risk | 61.0 |
Cached Content Preview
[Skip to content](https://github.com/google/BIG-bench#start-of-content)
You signed in with another tab or window. [Reload](https://github.com/google/BIG-bench) to refresh your session.You signed out in another tab or window. [Reload](https://github.com/google/BIG-bench) to refresh your session.You switched accounts on another tab or window. [Reload](https://github.com/google/BIG-bench) to refresh your session.Dismiss alert
{{ message }}
[google](https://github.com/google)/ **[BIG-bench](https://github.com/google/BIG-bench)** Public
- [Notifications](https://github.com/login?return_to=%2Fgoogle%2FBIG-bench) You must be signed in to change notification settings
- [Fork\\
617](https://github.com/login?return_to=%2Fgoogle%2FBIG-bench)
- [Star\\
3.2k](https://github.com/login?return_to=%2Fgoogle%2FBIG-bench)
main
[**14** Branches](https://github.com/google/BIG-bench/branches) [**0** Tags](https://github.com/google/BIG-bench/tags)
[Go to Branches page](https://github.com/google/BIG-bench/branches)[Go to Tags page](https://github.com/google/BIG-bench/tags)
Go to file
Code
Open more actions menu
## Folders and files
| Name | Name | Last commit message | Last commit date |
| --- | --- | --- | --- |
| ## Latest commit<br>[](https://github.com/Sohl-Dickstein)[Sohl-Dickstein](https://github.com/google/BIG-bench/commits?author=Sohl-Dickstein)<br>[Update README.md](https://github.com/google/BIG-bench/commit/092b196c1f8f14a54bbc62f24759d43bde46dd3b)<br>2 years agoJan 19, 2024<br>[092b196](https://github.com/google/BIG-bench/commit/092b196c1f8f14a54bbc62f24759d43bde46dd3b) · 2 years agoJan 19, 2024<br>## History<br>[5,893 Commits](https://github.com/google/BIG-bench/commits/main/) <br>Open commit details<br>[View commit history for this file.](https://github.com/google/BIG-bench/commits/main/) 5,893 Commits |
| [.github/workflows](https://github.com/google/BIG-bench/tree/main/.github/workflows "This path skips through empty directories") | [.github/workflows](https://github.com/google/BIG-bench/tree/main/.github/workflows "This path skips through empty directories") | [Update generate\_task\_summaries.yml](https://github.com/google/BIG-bench/commit/80d4ea7e8c7ba2c51e8aa66945066013320fa792 "Update generate_task_summaries.yml") | 4 years agoJun 7, 2022 |
| [bigbench](https://github.com/google/BIG-bench/tree/main/bigbench "bigbench") | [bigbench](https://github.com/google/BIG-bench/tree/main/bigbench "bigbench") | [auto-generate task summary tables, analysis, SeqIO task catalog, and …](https://github.com/google/BIG-bench/commit/761845c22056c885429efd2cfcec345ae00c1de7 "auto-generate task summary tables, analysis, SeqIO task catalog, and README.md headers") | 3 years agoApr 3, 2023 |
| [bleurt](https://github.com/google/BIG-bench/tree/main/bleurt "bleurt") | [bleurt](https://github.com/google/BIG-bench/tree/main/bleurt "bleurt") | [Updated packaging behavior:](https://github.co
... (truncated, 34 KB total)cbf6b1d02f9255db | Stable ID: ODMzNmNhMT