Skip to content
Longterm Wiki
Index
Fact·f_mGXpFffUh7·Fact

Center for AI Safety (CAIS) — publication: Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects

Verdictpartial85%
1 check · 5/18/2026

1 → partial

Our claim

entire record
Subject
Center for AI Safety (CAIS)
Value
Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects
As Of
January 2021
Notes
Created by Hendrycks et al.; became one of the most-cited AI benchmarks

Source evidence

1 src · 1 check
partial85%primaryHaiku 4.5 · 5/18/2026

NoteThe claim attributes MMLU to 'Center for AI Safety (CAIS)' as the publishing organization. The source paper lists author affiliations as UC Berkeley, Columbia University, UChicago, and UIUC — none of which is CAIS. While the paper confirms the benchmark name, the 57 subjects, and Hendrycks et al. as creators, it does not confirm CAIS as the institutional source. The source is about the MMLU benchmark itself, not about CAIS as an organization. This is a subject-identity mismatch: the claim attributes the work to CAIS, but the source shows it was authored by researchers at different institutions. The benchmark details are confirmed, but the organizational attribution is not supported by the source.

Case № f_mGXpFffUh7Filed 5/18/2026Confidence 85%
Source Check: Fact f_mGXpFffUh7 | Longterm Wiki