Skip to content
Longterm Wiki
Navigation
Updated 2026-03-16HistoryData
Page StatusContent
Edited 3 weeks ago4.0k words15 backlinks
19QualityStub •87ImportanceHigh39.5ResearchLow
Content6/13
SummaryScheduleEntityEdit history2Overview
Tables2/ ~16Diagrams0/ ~2Int. links49/ ~32Ext. links11/ ~20Footnotes45/ ~12References9/ ~12Quotes0Accuracy0RatingsN:1.5 R:2 A:1 C:4Backlinks15
Change History2
Surface tacticalValue in /wiki table and score 53 pages7 weeks ago

Added `tacticalValue` to `ExploreItem` interface, `getExploreItems()` mappings, the `/wiki` explore table (new sortable "Tact." column), and the card view sort dropdown. Scored 49 new pages with tactical values (4 were already scored), bringing total to 53.

sonnet-4 · ~30min

Wiki editing system refactoring#1847 weeks ago

Six refactors to the wiki editing pipeline: (1) extracted shared regex patterns to `crux/lib/patterns.ts`, (2) refactored validation in page-improver to use in-process engine calls instead of subprocess spawning, (3) split the 694-line `phases.ts` into 7 individual phase modules under `phases/`, (4) created shared LLM abstraction `crux/lib/llm.ts` unifying duplicated streaming/retry/tool-loop code, (5) added Zod schemas for LLM JSON response validation, (6) decomposed 820-line mermaid validation into `crux/lib/mermaid-checks.ts` (604 lines) + slim orchestrator (281 lines). Follow-up review integrated patterns.ts across 19+ files, fixed dead imports, corrected ToolHandler type, wired mdx-utils.ts to use shared patterns, replaced hardcoded model strings with MODELS constants, replaced `new Anthropic()` with `createLlmClient()`, replaced inline `extractText` implementations with shared `extractText()` from llm.ts, integrated `MARKDOWN_LINK_RE` into link validators, added `objectivityIssues` to the `AnalysisResult` type (removing an unsafe cast in utils.ts), fixed CI failure from eager client creation, and tested the full pipeline by improving 3 wiki pages. After manual review of 3 improved pages, fixed 8 systematic pipeline issues: (1) added content preservation instructions to prevent polish-tier content loss, (2) made auto-grading default after --apply, (3) added polish-tier citation suppression to prevent fabricated citations, (4) added Quick Assessment table requirement for person pages, (5) added required Overview section enforcement, (6) added section deduplication and content repetition checks to review phase, (7) added bare URL→markdown link conversion instruction, (8) extended biographical claim checker to catch publication/co-authorship and citation count claims. Subsequent iterative testing and prompt refinement: ran pipeline on jan-leike, chris-olah, far-ai pages. Discovered and fixed: (a) `<!-- NEEDS CITATION -->` HTML comments break MDX compilation (changed to `{/* NEEDS CITATION */}`), (b) excessive citation markers at polish tier — added instruction to only mark NEW claims (max 3-5 per page), (c) editorial meta-comments cluttering output — added no-meta-comments instruction, (d) thin padding sections — added anti-padding instruction, (e) section deduplication needed stronger emphasis — added merge instruction with common patterns. Final test results: jan-leike 1254→1997 words, chris-olah 1187→1687 words, far-ai 1519→2783 words, miri-era 2678→4338 words; all MDX compile, zero critical issues.

Issues2
QualityRated 19 but structure suggests 93 (underrated by 74 points)
Links7 links could use <R> components

Dan Hendrycks

Person

Dan Hendrycks

Comprehensive reference biography of Dan Hendrycks (CAIS director), covering his academic career (GELU, MMLU, OOD detection), CAIS founding and funding (including $6.5M FTX, Open Philanthropy/Coefficient Giving grants), policy work (SB 1047, NIST RMF input), and 2025 Superintelligence Strategy paper co-authored with Eric Schmidt and Alexandr Wang. The page is well-sourced and largely neutral but is purely descriptive reference material with no original synthesis or actionable guidance.

RoleDirector
Known ForAI safety research, benchmark creation, CAIS leadership, catastrophic risk focus
Related
Organizations
Center for AI Safety
Concepts
Compute Governance
People
Yoshua Bengio
4k words · 15 backlinks

Quick Assessment

DimensionAssessment
Primary RoleExecutive Director, Center for AI Safety (CAIS); AI safety researcher
Key ContributionsDeveloped MMLU and ETHICS benchmarks for evaluating language models; proposed the GELU activation function (adopted in BERT and GPT-4 series); foundational work on out-of-distribution detection; co-authored papers on robustness and ML safety; coordinated the May 2023 statement on AI extinction risk
Key PublicationsA Baseline for Detecting Misclassified and Out-of-Distribution Examples (ICLR 2017); Gaussian Error Linear Units (GELUs) (arXiv 2016); Measuring Massive Multitask Language Understanding (ICLR 2021); Aligning AI With Shared Human Values (ICLR 2021); Natural Adversarial Examples (CVPR 2021); Unsolved Problems in ML Safety (arXiv 2021); Introduction to AI Safety, Ethics, and Society (CRC Press, 2024); Superintelligence Strategy (arXiv 2025)
Institutional AffiliationCenter for AI Safety (CAIS), San Francisco; advisor to xAI and Scale AI
EducationB.S. with Honors, Computer Science, University of Chicago (2018); Ph.D., Computer Science, UC Berkeley (2022)
Influence on AI SafetyCAIS produces safety research, educational resources, and policy advocacy; Hendrycks co-authored NIST AI Risk Management Framework input (2022) and co-authored Superintelligence Strategy (2025) with Eric Schmidt and Alexandr Wang

Overview

Dan Hendrycks (born 1994 or 1995) is a computer scientist and AI safety researcher who serves as executive director of the Center for AI Safety (CAIS), a San Francisco-based nonprofit he co-founded in 2022 with Oliver Zhang.1 During his doctoral research at UC Berkeley — advised by Jacob Steinhardt and Dawn Song2 — he developed several benchmarks that became widely used reference points for evaluating large language models, including MMLU and the ETHICS dataset, both published at ICLR 2021.34 His dissertation, titled Machine Learning Safety, was completed in 2022.5

Prior to his benchmark work, Hendrycks co-authored two papers that became foundational in the deep learning literature: a 2016 arXiv preprint proposing the GELU activation function (later adopted in BERT, GPT-2, and subsequent transformer architectures),6 and a 2017 ICLR paper establishing a simple baseline for out-of-distribution detection using maximum softmax probabilities, which accumulated over 3,800 citations on Semantic Scholar and is regarded as a foundational reference in the OOD detection literature.7

Through CAIS, Hendrycks has combined continued technical research with field-building and policy engagement. In May 2023 he coordinated a public statement asserting that AI extinction risk should be treated as a global priority, which drew over 350 initial signatories — a count that grew to more than 500 as the page remained open — including Turing Award winners and executives from major AI laboratories.89 In 2024 he published an open-access textbook, Introduction to AI Safety, Ethics, and Society, through CRC Press (Taylor & Francis).10 In March 2025 he co-authored Superintelligence Strategy with former Google CEO Eric Schmidt and Scale AI CEO Alexandr Wang.11

Background

Hendrycks grew up in Missouri, where he graduated as valedictorian from Marshfield High School in 2014.12 He received a B.S. with Honors in Computer Science from the University of Chicago in 2018.12 He then enrolled in the Computer Science doctoral program at UC Berkeley, completing his PhD in 2022 under advisors Jacob Steinhardt and Dawn Song.25 His doctoral work was supported by an NSF Graduate Research Fellowship and a Coefficient Giving AI Fellowship.2

His PhD dissertation, Machine Learning Safety (UC Berkeley EECS Technical Report UCB/EECS-2022-253), covers work toward making systems perform reliably, act in accordance with human values, and addresses open problems in ML safety.5 Following his doctorate, Hendrycks co-founded CAIS in 2022, transitioning from academic research to running an independent nonprofit organization.1 As of 2024, his X profile lists him as director of CAIS and advisor to xAI and Scale AI.13

His research spans several areas within machine learning:

  • Out-of-distribution detection and uncertainty quantification
  • Robustness of neural networks to distribution shift
  • Development of benchmarks for evaluating language models
  • Adversarial robustness and natural adversarial examples

Early Research (2016–2019)

Before his dissertation and the MMLU benchmark, Hendrycks produced two papers that became widely cited in the machine learning literature, both co-authored with Kevin Gimpel at TTIC.

Gaussian Error Linear Units (GELUs) (arXiv, June 2016): This preprint proposed the GELU activation function, defined as x·Φ(x) where Φ(x) is the standard Gaussian CDF.6 The paper has not been published in a conference proceedings but remains an arXiv preprint; it was nonetheless adopted in BERT, GPT-2, and the broader GPT series of models, accumulating tens of thousands of citations and making it one of the most cited deep learning preprints.6 Hendrycks' Google Scholar page lists a total citation count exceeding 62,000 across all works as of available data.6

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks (ICLR 2017): This paper proposed a simple baseline using maximum softmax probabilities to detect when a neural network is presented with inputs outside its training distribution.7 It accumulated over 3,800 citations on Semantic Scholar, including approximately 780 highly influential citations, and is considered a foundational reference in the OOD detection literature.7

Natural Adversarial Examples (CVPR 2021, based on work from 2019): Co-authored with Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song, this paper introduced ImageNet-A (a harder ImageNet test set) and ImageNet-O (described as the first OOD detection dataset for ImageNet-scale models), demonstrating that ML models share weaknesses exploitable by natural, unmodified adversarial examples.14 It was published in the Proceedings of CVPR 2021, pages 15257–15266.

These papers established Hendrycks' research trajectory in robustness and reliability before his subsequent work on evaluation benchmarks and AI safety.

Center for AI Safety

Hendrycks co-founded the Center for AI Safety in 2022 as a 501(c)(3) nonprofit organization based in San Francisco, alongside co-founder Oliver Zhang.1 According to CAIS's mission statement, the organization aims to reduce societal-scale risks from artificial intelligence through research, field-building, and advocacy work.15

The organization has received general support grants from Coefficient Giving (formerly Open Philanthropy, rebranded November 2025)16 in 2022 and 2023, as well as a grant of $1,433,000 from Coefficient Giving to support its Philosophy Fellowship program.171819 As of early 2025, the Survival and Flourishing Fund (SFF) has provided additional funding — $1.1 million to CAIS and $1.6 million to the affiliated CAIS Action Fund — while Coefficient Giving grants to CAIS were not continuing at that time.20 A sister organization, the CAIS Action Fund (a 501(c)(4)), was formally launched in Washington D.C. in July 2024 and reported spending $270,000 on federal lobbying in 2024.21

CAIS also received $6.5 million from FTX between May and September 2022, before FTX declared bankruptcy in November 2022. The bankrupt FTX estate subsequently sought permission from a Delaware bankruptcy judge to issue subpoenas to CAIS, which had declined requests to voluntarily provide an accounting related to the transfers, according to a Bloomberg report from October 2023.22

By 2023, CAIS had grown to more than a dozen employees, according to a TIME profile.9

The organization's activities include:

  • Conducting technical safety research on topics such as robustness and evaluation methods
  • Educational programs, including the ML Safety course curriculum (announced on LessWrong in 2021)23
  • Policy-oriented work on compute governance and hardware-level interventions
  • Coordination efforts within the AI safety research community
  • A compute cluster supporting approximately 20 research labs, which had onboarded approximately 200 users working on 63 AI safety projects as of November 2023, and supported 77 papers in AI safety research in 20242425
  • A Philosophy Fellowship that hosted approximately 12 academic philosophers for a seven-month residency in San Francisco in 2023, producing 18 original research papers; fellows received $50,000 in funding and the program produced a special issue in Philosophical Studies (Springer), a peer-reviewed philosophy journal2627

CAIS has served as an institutional platform for Hendrycks' work connecting technical researchers with policymakers and coordinating public statements on AI risk.

Statement on AI Risk (May 2023)

In May 2023, Hendrycks coordinated a public statement that read: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."8 CAIS built an email verification system to ensure signatories verified their institutional affiliations before being listed.8

At the time of initial publication, more than 350 researchers and executives had signed the statement.8 The total count of signatories subsequently grew to more than 500 as the statement page remained open.9 Signatories included Geoffrey Hinton, Yoshua Bengio, Sam Altman (OpenAI), Demis Hassabis (Google DeepMind), and Dario Amodei (Anthropic), along with executives from Microsoft and many other researchers.89

The statement received coverage in major media outlets and was cited in subsequent policy discussions. Some commentators noted that the statement's brevity meant it did not specify which risks or interventions were being prioritized, and that signatories held a range of differing views about what actions would follow from the shared concern.8

Technical Research Contributions

Benchmarks and Evaluation

Hendrycks has developed several benchmarks used in evaluating language models and AI systems.

MMLU (Measuring Massive Multitask Language Understanding): Published at ICLR 2021, co-authored with Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt.3 The benchmark covers 57 tasks including elementary mathematics, U.S. history, computer science, and law, designed to test knowledge breadth in large language models. At the time of publication, the largest GPT-3 model improved over random chance by almost 20 percentage points on average, but had near-random accuracy on some subjects such as morality and law.3 As of July 2024, the MMLU dataset had exceeded 100 million downloads, and by mid-2024 leading models including Claude 3.5 Sonnet, GPT-4o, and Llama 3.1 405B were consistently achieving approximately 88% accuracy on the benchmark.28 The benchmark has also been adapted into localized versions including CMMLU (Chinese), KMMLU (Korean), ArabicMMLU, and TurkishMMLU.28

MMLU has since encountered criticism regarding benchmark saturation and data quality. GPT-4 achieved 86.4% accuracy on MMLU by March 2023, after which differentiation between leading models became difficult.29 Score variance of up to 10–13 percentage points depending on prompt methodology has been documented.29 A 2024 reanalysis (MMLU-Redux, published at NAACL 2025) manually re-annotated 5,700 questions across all 57 subjects and found approximately 6.49% of questions contain errors, with notably higher error rates in some subsets.30 As of 2025, MMLU has been partially replaced in evaluations by more challenging alternatives, including MMLU-Pro (presented at NeurIPS 2024).2928

ETHICS Dataset: Published at ICLR 2021 under the title Aligning AI With Shared Human Values, co-authored with Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt (arXiv: 2008.02275).4 The benchmark spans five subsets covering justice, well-being, duties, virtues, and commonsense morality, designed to evaluate whether language models can perform tasks related to moral reasoning across different ethical frameworks.

These benchmarks have been adopted by research groups evaluating new language models, providing standardized metrics for comparing systems.

Robustness and Distribution Shift

Hendrycks' work on robustness has examined how neural networks perform when tested on data that differs from their training distribution. His research has included:

  • Methods for detecting when inputs are out-of-distribution relative to training data
  • Studies of how models fail when encountering natural variations in data
  • Development of datasets containing "natural adversarial examples" that cause model failures without artificial perturbations
  • Analysis of calibration in neural network predictions

Natural Adversarial Examples (CVPR 2021): Co-authored with Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song.14 The paper introduced two challenging datasets, including IMAGENET-O, described as the first out-of-distribution detection dataset for ImageNet-scale models. The paper appeared in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pages 15262–15271 (arXiv: 1907.07174).14

This work connects to broader questions in technical AI safety research about how to ensure systems behave reliably in novel situations.

Unsolved Problems in ML Safety

Co-authored with Nicholas Carlini (Google Brain), John Schulman (OpenAI), and Jacob Steinhardt (arXiv: 2109.13916, September 2021).31 The paper presents four key research problem areas: Robustness, Monitoring, Alignment, and Systemic Safety, and provides a structured overview of open research directions in ML safety.

Textbook: Introduction to AI Safety, Ethics, and Society

In 2024, Hendrycks published Introduction to AI Safety, Ethics, and Society through CRC Press (Taylor & Francis imprint), DOI: 10.1201/9781003530336.10 The book is available as an open-access monograph on aisafetybook.com and in print, with an arXiv preprint posted November 2024 (arXiv: 2411.01042).10 It is targeted at upper-level undergraduate and postgraduate students and covers technical safety concepts, ethics, and a governance chapter spanning AI policy variables including the distribution of benefits, access to AI, and roles of companies, governments, and international bodies.10 CAIS reported that an associated online course launched in 2024 enrolled 240 participants.25

Policy Engagement and Advocacy

In February 2022, Hendrycks co-authored Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks (arXiv: 2206.08966) with Anthony M. Barrett, Jessica Newman, and Brandie Nonnecke, submitted to NIST as input to inform the AI Risk Management Framework (AI RMF).32

The CAIS Action Fund, a related 501(c)(4) organization, co-sponsored California SB 1047 alongside Economic Security Action California and Encode Justice, organized a joint letter signed by more than 80 technology organizations asking Congress to fund NIST AI work, and advocated for $10 million in funding for the U.S. AI Safety Institute.2133 The Action Fund was formally launched at a Capitol Hill event in July 2024 at which Senator Brian Schatz (D-HI) and Representative French Hill (R-AR) appeared, attended by multiple members of Congress.21

California SB 1047: SB 1047 would have required AI developers to test frontier models for hazardous capabilities and take steps to mitigate catastrophic risks. Governor Gavin Newsom vetoed the bill in September 2024, stating in his veto message that the bill was "well-intentioned" but expressed concern that it did not take into account deployment context or whether systems involve high-risk environments.34 Supporters of the bill included CAIS, Elon Musk, the Los Angeles Times editorial board, and Anthropic; opponents included Meta, OpenAI, and House Speaker Nancy Pelosi, who argued it would hinder innovation.34 Nathan Calvin, Senior Policy Counsel at CAIS Action Fund, stated that the organization was "disappointed by Governor Newsom's decision to veto this urgent and common sense safety bill."33 Hendrycks, writing on X following the veto, stated that although the outcome was "disappointing," the debate had "begun moving the conversation about AI safety into the mainstream, where it belongs," and added that the bill "has revealed that some industry calls for responsible AI are nothing more than PR aircover for their business and investment strategies."35 Newsom indicated he would work with the legislature, federal partners, and technology experts to develop more empirically informed AI safety regulations.34

Hendrycks has also given public presentations and media interviews explaining AI risk concerns to non-technical audiences. CAIS has engaged with bodies involved in developing the EU AI Act and provided technical input to legislative discussions in the United States.

Stated Positions on AI Risk

Hendrycks has publicly positioned certain AI risks as warranting priority attention. The May 2023 statement he coordinated explicitly compared mitigating AI extinction risk to addressing pandemics and nuclear war in terms of global priority.8

In a 2025 interview with Lawfare Media, Hendrycks stated that AGI is "not something that's very far off, but potentially on the horizon," adding that "even in the next few years, you could get AGI, so to speak," and emphasized "the trajectory that we're on rather than the current capabilities."36 Separately, on X in 2025, he proposed a testable definition of AGI based on Cattell-Horn-Carroll (CHC) theory of human intelligence — "an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult" — and assessed, using this self-constructed metric, that GPT-4 (2023) was approximately 27% of the way to that threshold and GPT-5 (2025) approximately 58%.37 These percentage estimates are Hendrycks' own assessments using his proposed CHC-based definition; the framework has not been independently validated and the source is a social media post rather than a peer-reviewed publication.

In March 2025, he co-authored Superintelligence Strategy (arXiv: 2503.05628) with Eric Schmidt and Alexandr Wang.11 The paper introduces a deterrence concept called Mutual Assured AI Malfunction (MAIM), under which any state's aggressive bid for unilateral AI dominance would be met with preventive sabotage by rivals. The paper argues against a Manhattan Project-style acceleration toward AGI and proposes a three-part framework: deterrence (MAIM), nonproliferation, and competitiveness. It defines superintelligence as AI that "would decisively surpass the world's best individual experts in nearly every intellectual domain" and identifies hacking, virology, and autonomous AI R&D as the safety-relevant cognitive domains of greatest concern.11 The paper drew responses from multiple commentators; RAND Corporation described it as "an important new report" and "a critical contribution" to AI policy debate, while also raising questions about whether the MAIM framework is consistent with private-sector-driven AI development.38 Critics on LessWrong argued that the MAIM analogy to Mutually Assured Destruction (MAD) fails because the potentially enormous payoff of achieving superintelligence first may destabilize rather than stabilize deterrence dynamics.39 In a September 2025 response piece, Hendrycks and co-author Adam Khoja clarified that the paper draws a "pedagogical parallel" between MAIM and MAD but acknowledged the structures differ significantly, noting that MAD is built around nuclear retaliation whereas MAIM involves AI preemption.40

His activities through CAIS reflect a combination of intervention approaches:

  • Technical research aimed at improving system safety and evaluation
  • Policy engagement with government bodies and international organizations
  • Public communication intended to broaden awareness of AI risk concerns, according to CAIS's stated goals15
  • Compute governance work exploring hardware-level interventions

CAIS Programs and Activities

Research

CAIS conducts and supports research in several areas:

Technical Safety: Work on robustness, alignment techniques, and evaluation methodologies for AI systems. Research includes both empirical studies of current systems and development of new safety methods.

Compute Governance: Investigation of interventions based on the hardware supply chain for AI systems, including tracking of compute resources and potential international coordination mechanisms.

ML Safety Education: CAIS developed curriculum materials for teaching machine learning safety concepts, announced publicly in 2021.23 The course is described on course.mlsafety.org as "an advanced course covering empirical directions to reduce AI x-risk," with a full syllabus, lectures, readings, and assignments available publicly.41 Hendrycks worked on the course for approximately eight months before its launch and intended it to serve as a default resource for ML researchers interested in AI safety as well as undergraduates beginning safety research.23 A 2024 online course associated with CAIS's textbook enrolled 240 participants.25

Advocacy and Field-Building

CAIS has organized workshops, maintained networks of researchers working on safety-related topics, and engaged with policymakers. As noted above, the organization has co-authored NIST framework input, co-sponsored California SB 1047 through the CAIS Action Fund (which was subsequently vetoed by Governor Newsom in September 2024), and organized the 2024 Capitol Hill launch event with members of Congress.213242

Perspectives and Debates

Hendrycks' approach to AI safety places catastrophic and existential risk scenarios among its top concerns. This framing is part of ongoing debates within the AI research community about how to allocate attention and resources across different categories of AI-related concerns.

Critiques of Risk Framing

Critics, including researchers focused on near-term AI harms, have argued that emphasis on long-term or extinction-level risks may draw resources and attention away from documented harms related to bias, fairness, and misuse of existing AI systems. Researchers at organizations such as the AI Now Institute and the Distributed AI Research Institute (DAIR) have argued in published work that speculative long-term risk framing can function to divert regulatory attention from present harms; these critiques are directed at the broader x-risk research community rather than specifically at CAIS.8 Others have questioned whether the extinction risk framing is supported by available evidence, and have noted that the brief May 2023 statement was signed by researchers with substantially differing views on what the appropriate response to AI risk should be.8

Allegations Regarding Conflicts of Interest

In 2024, Pirate Wires, an online publication with a stated skepticism toward the effective altruism ecosystem, reported that Hendrycks had co-founded Gray Swan — an AI safety compliance startup — while CAIS Action Fund was co-sponsoring SB 1047, which would create AI compliance requirements. The report alleged the two activities created a conflict of interest.43 Gray Swan's official statement said the company "explicitly does not hold any position on SB 1047" and that its products and services "are not and will not be designed to satisfy any of the auditing requirements proposed in the bill."44 Hendrycks described the Pirate Wires article as "bad-faith gotcha journalism" on X, stating that Gray Swan was not designed to offer the type of audits SB 1047 would require, and subsequently said he divested his equity stake in the company "in order to send a clear signal."45

Critiques of MMLU

The MMLU benchmark, one of Hendrycks' most cited technical contributions, has attracted specific methodological criticism: saturation (leading models reaching the 86–89% accuracy range with limited differentiation), prompt sensitivity (score variance of up to 10–13 percentage points depending on methodology), and a documented error rate of approximately 6.49% in ground-truth answers identified by MMLU-Redux (2024).2930

Broader Context

Hendrycks has maintained that catastrophic risks warrant prioritization while not dismissing other AI safety concerns. The design of CAIS programs reflects this prioritization, with research and advocacy efforts concentrated on scenarios involving large-scale harm. The field of AI safety research contains diverse perspectives on which problems are most important, what methods are most promising, and how technical research should relate to policy and governance questions; Hendrycks' work represents one approach within this broader landscape.

Selected Publications

TitleAuthorsVenueYearIdentifier
Gaussian Error Linear Units (GELUs)Hendrycks, GimpelarXiv preprint2016arXiv:1606.08415
A Baseline for Detecting Misclassified and Out-of-Distribution ExamplesHendrycks, GimpelICLR 20172017arXiv:1610.02136
Measuring Massive Multitask Language UnderstandingHendrycks, Burns, Basart, Zou, Mazeika, Song, SteinhardtICLR 20212021arXiv:2009.03300
Aligning AI With Shared Human ValuesHendrycks, Burns, Basart, Critch, Li, Song, SteinhardtICLR 20212021arXiv:2008.02275
Natural Adversarial ExamplesHendrycks, Zhao, Basart, Steinhardt, SongCVPR 20212021arXiv:1907.07174
Unsolved Problems in ML SafetyHendrycks, Carlini, Schulman, SteinhardtarXiv2021arXiv:2109.13916
Actionable Guidance for High-Consequence AI Risk ManagementBarrett, Hendrycks, Newman, NonneckeNIST AI RMF submission2022arXiv:2206.08966
Introduction to AI Safety, Ethics, and SocietyHendrycksCRC Press (Taylor & Francis)2024DOI:10.1201/9781003530336
Superintelligence StrategyHendrycks, Schmidt, WangarXiv2025arXiv:2503.05628

Footnotes

  1. Center for AI Safety – Wikipedia. https://en.wikipedia.org/wiki/Center_for_AI_Safety 2 3

  2. Dan Hendrycks – Schmidt Sciences Grantee Profile. https://www.schmidtsciences.org/grantee/dan-hendrycks/ 2 3

  3. Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. ICLR 2021. arXiv:2009.03300. https://arxiv.org/abs/2009.03300 2 3

  4. Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D., & Steinhardt, J. (2021). Aligning AI With Shared Human Values. ICLR 2021. arXiv:2008.02275. https://arxiv.org/abs/2008.02275 2

  5. Machine Learning Safety — UC Berkeley EECS Dissertation. UCB/EECS-2022-253. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-253.html 2 3

  6. Dan Hendrycks and Kevin Gimpel, "Gaussian Error Linear Units (GELUs)," arXiv:1606.08415, submitted June 2016, revised June 2023. https://arxiv.org/abs/1606.08415 2 3 4

  7. Dan Hendrycks and Kevin Gimpel, "A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks," ICLR 2017. arXiv:1610.02136. Semantic Scholar citation count: approximately 3,887 citations including ~781 highly influential citations. https://arxiv.org/abs/1610.02136 2 3

  8. CBC News. "Artificial intelligence poses 'risk of extinction,' tech execs and experts warn." May 30–31, 2023. https://www.cbc.ca/news/world/artificial-intelligence-extinction-risk-1.6859118; CAIS AI Risk Statement press release: https://safe.ai/work/press-release-ai-risk 2 3 4 5 6 7 8 9

  9. TIME. "Dan Hendrycks: The 100 Most Influential People in AI 2023." 2023. https://time.com/collection/time100-ai/6309050/dan-hendrycks/ 2 3 4

  10. Hendrycks, D. (2024). Introduction to AI Safety, Ethics, and Society. CRC Press (Taylor & Francis). DOI:10.1201/9781003530336. https://www.aisafetybook.com; arXiv:2411.01042. 2 3 4

  11. Hendrycks, D., Schmidt, E., & Wang, A. (2025). Superintelligence Strategy: Expert Version. arXiv:2503.05628. https://arxiv.org/abs/2503.05628 2 3

  12. Dan Hendrycks – Personal Academic Page / CV, UC Berkeley. https://people.eecs.berkeley.edu/~hendrycks/; CV: https://people.eecs.berkeley.edu/~hendrycks/CV.pdf 2

  13. Dan Hendrycks, X profile (@hendrycks / @DanHendrycks), accessed 2024. https://x.com/hendrycks

  14. Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., & Song, D. (2021). Natural Adversarial Examples. CVPR 2021, pp. 15262–15271. arXiv:1907.07174. https://openaccess.thecvf.com/content/CVPR2021/html/Hendrycks_Natural_Adversarial_Examples_CVPR_2021_paper.html 2 3

  15. Citation rc-27f3 2

  16. Inside Philanthropy, "Open Philanthropy Is Now Coefficient Giving. Here's What Has (and Hasn't) Changed," November 2025. https://www.insidephilanthropy.com/home/open-philanthropy-is-now-coefficient-giving-heres-what-has-and-hasnt-changed

  17. Open Philanthropy. Center for AI Safety – General Support (2022). https://www.openphilanthropy.org/grants/center-for-ai-safety-general-support/

  18. Open Philanthropy. Center for AI Safety – General Support (2023). https://www.openphilanthropy.org/grants/center-for-ai-safety-general-support-2023/

  19. Open Philanthropy. Center for AI Safety – Philosophy Fellowship and NeurIPS Prizes. https://www.openphilanthropy.org/grants/center-for-ai-safety-philosophy-fellowship/

  20. 80,000 Hours. "It looks like there are some good funding opportunities in AI safety right now." January 2025. https://80000hours.org/2025/01/it-looks-like-there-are-some-good-funding-opportunities-in-ai-safety-right-now/

  21. Center for AI Safety Action Fund – 2024 Year in Review. CAIS Newsletter. https://newsletter.safe.ai/p/aisn-45-center-for-ai-safety-2024; Washington AI Network, July 29, 2024. https://washingtonainetwork.com/2024/07/29/center-for-ai-safety-hosts-dc-launch-event-featuring-cais-dan-hendrycks-jaan-tallinn-sen-brian-schatz-rep-french-hill-and-cnns-pamela-brown/ 2 3 4

  22. Bloomberg, "FTX Is Probing $6.5 Million Paid to Leading Nonprofit Group on AI Safety," October 25, 2023. https://www.bloomberg.com/news/articles/2023-10-25/ftx-probing-6-5-million-paid-to-leading-ai-safety-nonprofit

  23. Hendrycks, D. / CAIS. "Announcing the Introduction to ML Safety Course." LessWrong, 2021. https://www.lesswrong.com/posts/4F8Bg8Z5cePTBofzo/announcing-the-introduction-to-ml-safety-course 2 3

  24. Center for AI Safety 2023 Year in Review. CAIS Newsletter, December 21, 2023. https://newsletter.safe.ai/p/aisn-28-center-for-ai-safety-2023

  25. Center for AI Safety, "AISN #45: Center for AI Safety 2024 Year in Review," December 19, 2024. https://newsletter.safe.ai/p/aisn-45-center-for-ai-safety-2024 2 3

  26. Center for AI Safety, "Philosophy Fellowship 2023," CAIS official page. https://safe.ai/work/philosophy-fellowship

  27. Springer/Philosophical Studies, "AI Safety Special Issue," edited by Cameron Kirk-Giannini and Dan Hendrycks. https://link.springer.com/collections/cadgidecih

  28. Wikipedia contributors, "MMLU," Wikipedia, updated December 2024/early 2025. https://en.wikipedia.org/wiki/MMLU 2 3

  29. MMLU – Wikipedia. https://en.wikipedia.org/wiki/MMLU; MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark. arXiv:2406.01574. https://arxiv.org/html/2406.01574v1 2 3 4

  30. Gema et al. "Are We Done with MMLU?" NAACL 2025. arXiv:2406.04127. https://arxiv.org/abs/2406.04127 2

  31. Hendrycks, D., Carlini, N., Schulman, J., & Steinhardt, J. (2021). Unsolved Problems in ML Safety. arXiv:2109.13916. https://arxiv.org/abs/2109.13916

  32. Barrett, A.M., Hendrycks, D., Newman, J., & Nonnecke, B. (2022). Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks. Submitted to NIST AI RMF. arXiv:2206.08966. https://arxiv.org/abs/2206.08966 2

  33. California State Senate — Sen. Scott Wiener's office, "Senator Wiener Responds to Governor Newsom Vetoing Landmark AI Bill," September 2024. https://sd11.senate.ca.gov/news/senator-wiener-responds-governor-newsom-vetoing-landmark-ai-bill 2

  34. Crowell & Moring LLP, "Gov. Newsom Vetoes AI Bill but Leaves the Door Open to Future CA Regulation," October 2024. https://www.crowell.com/en/insights/client-alerts/gov-newsom-vetoes-ai-bill-but-leaves-the-door-open-to-future-ca-regulation 2 3

  35. Space Daily (wire report), "California governor vetoes AI safety bill," September/October 2024. https://www.spacedaily.com/reports/California_governor_vetoes_AI_safety_bill_999.html

  36. Lawfare Media. "Lawfare Daily: Dan Hendrycks on National Security in the Age of Superintelligent AI." 2025. https://www.lawfaremedia.org/article/lawfare-daily--dan-hendrycks-on-national-security-in-the-age-of-superintelligent-ai

  37. Dan Hendrycks on X, 2025. https://x.com/DanHendrycks/status/1978828377269117007

  38. RAND Corporation, "Seeking Stability in the Competition for AI Advantage," March 13, 2025. https://www.rand.org/pubs/commentary/2025/03/seeking-stability-in-the-competition-for-ai-advantage.html

  39. Eli Blee-Goldman, "The Jackpot Jinx (or why 'Superintelligence Strategy' is wrong)," LessWrong, March 10, 2025. https://www.lesswrong.com/posts/JF2JABmeM9opszwZA/the-jackpot-jinx-or-why-superintelligence-strategy-is-wrong

  40. Dan Hendrycks and Adam Khoja, "AI Deterrence Is Our Best Option," AI Frontiers, September 18, 2025. https://ai-frontiers.org/articles/ai-deterrence-is-our-best-option

  41. CAIS ML Safety Course. https://course.mlsafety.org/

  42. Center for AI Safety, "AI Safety Newsletter #42: Newsom Vetoes SB 1047," October 1, 2024. https://newsletter.safe.ai/p/ai-safety-newsletter-42-newsom-vetoes

  43. Pirate Wires, "The Conflict of Interest at the Heart of CA's AI Bill," 2024. https://www.piratewires.com/p/sb-1047-dan-hendrycks-conflict-of-interest

  44. Gray Swan AI, "Statement on SB-1047 and Founders," 2024. https://www.grayswan.ai/blog/sb1047

  45. Dan Hendrycks, post on X, July 2024. https://x.com/DanHendrycks/status/1814006259571667045

References

Wikipedia's overview of the Center for AI Safety (CAIS), a nonprofit organization focused on reducing societal-scale risks from advanced AI systems. CAIS is known for publishing the 2023 statement on AI extinction risk signed by hundreds of leading AI researchers and for conducting technical safety research. The article covers the organization's founding, mission, key initiatives, and notable figures involved.

★★★☆☆
2[2406.04127] Are We Done with MMLU?arXiv·Aryo Pradipta Gema et al.·2024·Paper

This paper systematically identifies and categorizes errors in the MMLU benchmark, finding that a substantial fraction of questions (e.g., 57% of Virology questions) contain ground truth errors. The authors introduce MMLU-Redux, a manually re-annotated subset of 3,000 questions, and show that these errors meaningfully distort LLM performance metrics and model rankings.

★★★☆☆
3Hendrycks, D., Schmidt, E., & Wang, A.arXiv·Dan Hendrycks, Eric Schmidt & Alexandr Wang·2025·Paper

This paper by Hendrycks, Schmidt, and Wang proposes a comprehensive national security strategy for superintelligence—AI systems vastly superior to humans across cognitive tasks. The authors argue that rapid AI advances pose destabilizing geopolitical risks, including lowered barriers for catastrophic misuse by rogue actors and potential great-power conflict over AI dominance. They introduce Mutual Assured AI Malfunction (MAIM), a deterrence framework analogous to nuclear MAD where states prevent rivals' unilateral AI dominance through preventive sabotage. The paper outlines a three-part strategy combining deterrence, nonproliferation to hostile actors, and competitive strengthening through AI development.

★★★☆☆
4Barrett, A.M., Hendrycks, D., Newman, J., & Nonnecke, B.arXiv·Anthony M. Barrett, Dan Hendrycks, Jessica Newman & Brandie Nonnecke·2022·Paper

This paper translates high-level AI safety principles into concrete risk management practices to support NIST's AI Risk Management Framework (AI RMF). It provides recommendations for identifying unintended uses and misuses, incorporating catastrophic-risk factors into assessments, addressing human rights harms, and improving risk reporting. The work aims to establish practical standards for managing existential and societal-scale AI risks within governance frameworks.

★★★☆☆
5Intro to ML Safety Coursecourse.mlsafety.org

A structured university-level course on machine learning safety developed by the Center for AI Safety, covering topics from robustness and anomaly detection to alignment and systemic safety. The course includes lecture recordings, slides, notes, and coding assignments across modules on safety engineering, robustness, monitoring, alignment, and emerging risks.

6Hendrycks and Gimpel (2017)arXiv·Dan Hendrycks & Kevin Gimpel·2016·Paper

Hendrycks and Gimpel (2017) propose a simple but effective baseline for detecting misclassified and out-of-distribution (OOD) inputs using maximum softmax probabilities. The core finding is that correctly classified in-distribution examples tend to produce higher maximum softmax probabilities than errors or OOD inputs. The paper establishes benchmark tasks across vision, NLP, and speech, and shows the baseline can be improved, motivating further research.

★★★☆☆
7[2009.03300] Measuring Massive Multitask Language UnderstandingarXiv·Dan Hendrycks et al.·2020·Paper

Introduces the MMLU benchmark, a comprehensive evaluation suite covering 57 subjects across STEM, humanities, social sciences, and more, designed to measure breadth and depth of language model knowledge. The benchmark tests models from elementary to professional level and reveals significant gaps between human expert performance and state-of-the-art models at the time of publication. It became a standard benchmark for tracking LLM capability progress.

★★★☆☆
8ICLR 2021arXiv·Dan Hendrycks et al.·2020·Paper

This paper introduces ETHICS, a benchmark dataset for evaluating language models' understanding of moral concepts across justice, well-being, duties, virtues, and commonsense morality. The dataset requires models to predict human moral judgments about diverse text scenarios, combining physical and social world knowledge with value judgments. The authors find that current language models demonstrate promising but incomplete ability to predict ethical judgments, suggesting that progress on machine ethics is achievable and could help align AI systems with human values.

★★★☆☆
9Unsolved Problems in ML SafetyarXiv·Dan Hendrycks, Nicholas Carlini, John Schulman & Jacob Steinhardt·2021·Paper

This paper presents a comprehensive roadmap for ML safety research, identifying four critical problem areas that the field must address as machine learning systems grow larger and are deployed in high-stakes applications. The authors categorize safety challenges into Robustness (withstanding hazards), Monitoring (identifying hazards), Alignment (reducing inherent model hazards), and Systemic Safety (reducing systemic hazards). By clarifying the motivation behind each problem and providing concrete research directions, the paper aims to guide the ML safety research community toward addressing emerging safety challenges posed by large-scale models.

★★★☆☆

Structured Data

7 facts·2 recordsView in FactBase →
Employed By
Center for AI Safety
as of Mar 2026
Role / Title
Director
as of Mar 2026

All Facts

7
People
PropertyValueAs OfSource
Role / TitleDirectorMar 2026
Employed ByCenter for AI SafetyMar 2026
Biographical
PropertyValueAs OfSource
Notable ForAI safety research; benchmark creation; CAIS leadership; catastrophic risk focusMar 2026
EducationUniversity of California, Berkeley
Wikipediahttps://en.wikipedia.org/wiki/Dan_Hendrycks
General
PropertyValueAs OfSource
Websitehttps://hendrycks.com

Career History

2
OrganizationTitleStartEnd
Center for AI SafetyExecutive Director2022
University of California, BerkeleyPhD Student20182022

Related Wiki Pages

Top Related Pages

Other

AI EvaluationsGeoffrey HintonGPT-4GPT-4oMMLUMMLU-Pro

Organizations

Coefficient GivingAnthropic

Concepts

Superintelligence

Key Debates

Technical AI Safety ResearchIs AI Existential Risk Real?

Approaches

AI Safety Training Programs

Analysis

AI Risk Warning Signs ModelAlignment Robustness Trajectory Model

Risks

Emergent CapabilitiesAI Distributional Shift

Policy

Safe and Secure Innovation for Frontier Artificial Intelligence Models ActEU AI Act

Historical

Deep Learning Revolution EraThe MIRI Era