Skip to content
Longterm Wiki
All Source Checks
Fact

Center for AI Safety — publication: Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects

partial85% confidence

1 evidence check

Last checked: 3/31/2026

The source text (arxiv.org/abs/2009.03300) confirms the core facts: (1) MMLU measures multitask language understanding, (2) it covers 57 tasks/subjects across diverse domains, and (3) it was created by Hendrycks et al. However, the claim attributes this publication to 'Center for AI Safety (CAIS)' but the source lists the authors' affiliations as UC Berkeley, Columbia University, UChicago, and UIUC—not CAIS. The arxiv submission date is 2009.03300 (September 2020), not 2021-01. The claim about it becoming 'one of the most-cited AI benchmarks' is not addressed in the provided source excerpt. The authorship attribution to CAIS appears to be incorrect based on the author affiliations shown.

Evidence — 1 source, 1 check

partial85%primaryHaiku 4.5 · 3/31/2026
Found: The source confirms MMLU covers 57 tasks/subjects and was created by Hendrycks et al. However, the source does not identify CAIS as the publisher/creator, and does not confirm the 2021-01 date or that

Note: The source text (arxiv.org/abs/2009.03300) confirms the core facts: (1) MMLU measures multitask language understanding, (2) it covers 57 tasks/subjects across diverse domains, and (3) it was created by Hendrycks et al. However, the claim attributes this publication to 'Center for AI Safety (CAIS)' but the source lists the authors' affiliations as UC Berkeley, Columbia University, UChicago, and UIUC—not CAIS. The arxiv submission date is 2009.03300 (September 2020), not 2021-01. The claim about it becoming 'one of the most-cited AI benchmarks' is not addressed in the provided source excerpt. The authorship attribution to CAIS appears to be incorrect based on the author affiliations shown.

Debug info

Record type: fact

Record ID: f_mGXpFffUh7

Source Check: Fact f_mGXpFffUh7 | Longterm Wiki