Dan Hendrycks
Dan Hendrycks
Comprehensive reference biography of Dan Hendrycks (CAIS director), covering his academic career (GELU, MMLU, OOD detection), CAIS founding and funding (including $6.5M FTX, Open Philanthropy/Coefficient Giving grants), policy work (SB 1047, NIST RMF input), and 2025 Superintelligence Strategy paper co-authored with Eric Schmidt and Alexandr Wang. The page is well-sourced and largely neutral but is purely descriptive reference material with no original synthesis or actionable guidance.
Quick Assessment
| Dimension | Assessment |
|---|---|
| Primary Role | Executive Director, Center for AI Safety (CAIS); AI safety researcher |
| Key Contributions | Developed MMLU and ETHICS benchmarks for evaluating language models; proposed the GELU activation function (adopted in BERT and GPT-4 series); foundational work on out-of-distribution detection; co-authored papers on robustness and ML safety; coordinated the May 2023 statement on AI extinction risk |
| Key Publications | A Baseline for Detecting Misclassified and Out-of-Distribution Examples (ICLR 2017); Gaussian Error Linear Units (GELUs) (arXiv 2016); Measuring Massive Multitask Language Understanding (ICLR 2021); Aligning AI With Shared Human Values (ICLR 2021); Natural Adversarial Examples (CVPR 2021); Unsolved Problems in ML Safety (arXiv 2021); Introduction to AI Safety, Ethics, and Society (CRC Press, 2024); Superintelligence Strategy (arXiv 2025) |
| Institutional Affiliation | Center for AI Safety (CAIS), San Francisco; advisor to xAI and Scale AI |
| Education | B.S. with Honors, Computer Science, University of Chicago (2018); Ph.D., Computer Science, UC Berkeley (2022) |
| Influence on AI Safety | CAIS produces safety research, educational resources, and policy advocacy; Hendrycks co-authored NIST AI Risk Management Framework input (2022) and co-authored Superintelligence Strategy (2025) with Eric Schmidt and Alexandr Wang |
Overview
Dan Hendrycks (born 1994 or 1995) is a computer scientist and AI safety researcher who serves as executive director of the Center for AI Safety (CAIS), a San Francisco-based nonprofit he co-founded in 2022 with Oliver Zhang.1 During his doctoral research at UC Berkeley — advised by Jacob Steinhardt and Dawn Song2 — he developed several benchmarks that became widely used reference points for evaluating large language models, including MMLU and the ETHICS dataset, both published at ICLR 2021.34 His dissertation, titled Machine Learning Safety, was completed in 2022.5
Prior to his benchmark work, Hendrycks co-authored two papers that became foundational in the deep learning literature: a 2016 arXiv preprint proposing the GELU activation function (later adopted in BERT, GPT-2, and subsequent transformer architectures),6 and a 2017 ICLR paper establishing a simple baseline for out-of-distribution detection using maximum softmax probabilities, which accumulated over 3,800 citations on Semantic Scholar and is regarded as a foundational reference in the OOD detection literature.7
Through CAIS, Hendrycks has combined continued technical research with field-building and policy engagement. In May 2023 he coordinated a public statement asserting that AI extinction risk should be treated as a global priority, which drew over 350 initial signatories — a count that grew to more than 500 as the page remained open — including Turing Award winners and executives from major AI laboratories.89 In 2024 he published an open-access textbook, Introduction to AI Safety, Ethics, and Society, through CRC Press (Taylor & Francis).10 In March 2025 he co-authored Superintelligence Strategy with former Google CEO Eric Schmidt and Scale AI CEO Alexandr Wang.11
Background
Hendrycks grew up in Missouri, where he graduated as valedictorian from Marshfield High School in 2014.12 He received a B.S. with Honors in Computer Science from the University of Chicago in 2018.12 He then enrolled in the Computer Science doctoral program at UC Berkeley, completing his PhD in 2022 under advisors Jacob Steinhardt and Dawn Song.25 His doctoral work was supported by an NSF Graduate Research Fellowship and a Coefficient Giving AI Fellowship.2
His PhD dissertation, Machine Learning Safety (UC Berkeley EECS Technical Report UCB/EECS-2022-253), covers work toward making systems perform reliably, act in accordance with human values, and addresses open problems in ML safety.5 Following his doctorate, Hendrycks co-founded CAIS in 2022, transitioning from academic research to running an independent nonprofit organization.1 As of 2024, his X profile lists him as director of CAIS and advisor to xAI and Scale AI.13
His research spans several areas within machine learning:
- Out-of-distribution detection and uncertainty quantification
- Robustness of neural networks to distribution shift
- Development of benchmarks for evaluating language models
- Adversarial robustness and natural adversarial examples
Early Research (2016–2019)
Before his dissertation and the MMLU benchmark, Hendrycks produced two papers that became widely cited in the machine learning literature, both co-authored with Kevin Gimpel at TTIC.
Gaussian Error Linear Units (GELUs) (arXiv, June 2016): This preprint proposed the GELU activation function, defined as x·Φ(x) where Φ(x) is the standard Gaussian CDF.6 The paper has not been published in a conference proceedings but remains an arXiv preprint; it was nonetheless adopted in BERT, GPT-2, and the broader GPT series of models, accumulating tens of thousands of citations and making it one of the most cited deep learning preprints.6 Hendrycks' Google Scholar page lists a total citation count exceeding 62,000 across all works as of available data.6
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks (ICLR 2017): This paper proposed a simple baseline using maximum softmax probabilities to detect when a neural network is presented with inputs outside its training distribution.7 It accumulated over 3,800 citations on Semantic Scholar, including approximately 780 highly influential citations, and is considered a foundational reference in the OOD detection literature.7
Natural Adversarial Examples (CVPR 2021, based on work from 2019): Co-authored with Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song, this paper introduced ImageNet-A (a harder ImageNet test set) and ImageNet-O (described as the first OOD detection dataset for ImageNet-scale models), demonstrating that ML models share weaknesses exploitable by natural, unmodified adversarial examples.14 It was published in the Proceedings of CVPR 2021, pages 15257–15266.
These papers established Hendrycks' research trajectory in robustness and reliability before his subsequent work on evaluation benchmarks and AI safety.
Center for AI Safety
Hendrycks co-founded the Center for AI Safety in 2022 as a 501(c)(3) nonprofit organization based in San Francisco, alongside co-founder Oliver Zhang.1 According to CAIS's mission statement, the organization aims to reduce societal-scale risks from artificial intelligence through research, field-building, and advocacy work.15
The organization has received general support grants from Coefficient Giving (formerly Open Philanthropy, rebranded November 2025)16 in 2022 and 2023, as well as a grant of $1,433,000 from Coefficient Giving to support its Philosophy Fellowship program.171819 As of early 2025, the Survival and Flourishing Fund (SFF) has provided additional funding — $1.1 million to CAIS and $1.6 million to the affiliated CAIS Action Fund — while Coefficient Giving grants to CAIS were not continuing at that time.20 A sister organization, the CAIS Action Fund (a 501(c)(4)), was formally launched in Washington D.C. in July 2024 and reported spending $270,000 on federal lobbying in 2024.21
CAIS also received $6.5 million from FTX between May and September 2022, before FTX declared bankruptcy in November 2022. The bankrupt FTX estate subsequently sought permission from a Delaware bankruptcy judge to issue subpoenas to CAIS, which had declined requests to voluntarily provide an accounting related to the transfers, according to a Bloomberg report from October 2023.22
By 2023, CAIS had grown to more than a dozen employees, according to a TIME profile.9
The organization's activities include:
- Conducting technical safety research on topics such as robustness and evaluation methods
- Educational programs, including the ML Safety course curriculum (announced on LessWrong in 2021)23
- Policy-oriented work on compute governance and hardware-level interventions
- Coordination efforts within the AI safety research community
- A compute cluster supporting approximately 20 research labs, which had onboarded approximately 200 users working on 63 AI safety projects as of November 2023, and supported 77 papers in AI safety research in 20242425
- A Philosophy Fellowship that hosted approximately 12 academic philosophers for a seven-month residency in San Francisco in 2023, producing 18 original research papers; fellows received $50,000 in funding and the program produced a special issue in Philosophical Studies (Springer), a peer-reviewed philosophy journal2627
CAIS has served as an institutional platform for Hendrycks' work connecting technical researchers with policymakers and coordinating public statements on AI risk.
Statement on AI Risk (May 2023)
In May 2023, Hendrycks coordinated a public statement that read: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."8 CAIS built an email verification system to ensure signatories verified their institutional affiliations before being listed.8
At the time of initial publication, more than 350 researchers and executives had signed the statement.8 The total count of signatories subsequently grew to more than 500 as the statement page remained open.9 Signatories included Geoffrey Hinton, Yoshua Bengio, Sam Altman (OpenAI), Demis Hassabis (Google DeepMind), and Dario Amodei (Anthropic), along with executives from Microsoft and many other researchers.89
The statement received coverage in major media outlets and was cited in subsequent policy discussions. Some commentators noted that the statement's brevity meant it did not specify which risks or interventions were being prioritized, and that signatories held a range of differing views about what actions would follow from the shared concern.8
Technical Research Contributions
Benchmarks and Evaluation
Hendrycks has developed several benchmarks used in evaluating language models and AI systems.
MMLU (Measuring Massive Multitask Language Understanding): Published at ICLR 2021, co-authored with Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt.3 The benchmark covers 57 tasks including elementary mathematics, U.S. history, computer science, and law, designed to test knowledge breadth in large language models. At the time of publication, the largest GPT-3 model improved over random chance by almost 20 percentage points on average, but had near-random accuracy on some subjects such as morality and law.3 As of July 2024, the MMLU dataset had exceeded 100 million downloads, and by mid-2024 leading models including Claude 3.5 Sonnet, GPT-4o, and Llama 3.1 405B were consistently achieving approximately 88% accuracy on the benchmark.28 The benchmark has also been adapted into localized versions including CMMLU (Chinese), KMMLU (Korean), ArabicMMLU, and TurkishMMLU.28
MMLU has since encountered criticism regarding benchmark saturation and data quality. GPT-4 achieved 86.4% accuracy on MMLU by March 2023, after which differentiation between leading models became difficult.29 Score variance of up to 10–13 percentage points depending on prompt methodology has been documented.29 A 2024 reanalysis (MMLU-Redux, published at NAACL 2025) manually re-annotated 5,700 questions across all 57 subjects and found approximately 6.49% of questions contain errors, with notably higher error rates in some subsets.30 As of 2025, MMLU has been partially replaced in evaluations by more challenging alternatives, including MMLU-Pro (presented at NeurIPS 2024).2928
ETHICS Dataset: Published at ICLR 2021 under the title Aligning AI With Shared Human Values, co-authored with Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt (arXiv: 2008.02275).4 The benchmark spans five subsets covering justice, well-being, duties, virtues, and commonsense morality, designed to evaluate whether language models can perform tasks related to moral reasoning across different ethical frameworks.
These benchmarks have been adopted by research groups evaluating new language models, providing standardized metrics for comparing systems.
Robustness and Distribution Shift
Hendrycks' work on robustness has examined how neural networks perform when tested on data that differs from their training distribution. His research has included:
- Methods for detecting when inputs are out-of-distribution relative to training data
- Studies of how models fail when encountering natural variations in data
- Development of datasets containing "natural adversarial examples" that cause model failures without artificial perturbations
- Analysis of calibration in neural network predictions
Natural Adversarial Examples (CVPR 2021): Co-authored with Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song.14 The paper introduced two challenging datasets, including IMAGENET-O, described as the first out-of-distribution detection dataset for ImageNet-scale models. The paper appeared in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pages 15262–15271 (arXiv: 1907.07174).14
This work connects to broader questions in technical AI safety research about how to ensure systems behave reliably in novel situations.
Unsolved Problems in ML Safety
Co-authored with Nicholas Carlini (Google Brain), John Schulman (OpenAI), and Jacob Steinhardt (arXiv: 2109.13916, September 2021).31 The paper presents four key research problem areas: Robustness, Monitoring, Alignment, and Systemic Safety, and provides a structured overview of open research directions in ML safety.
Textbook: Introduction to AI Safety, Ethics, and Society
In 2024, Hendrycks published Introduction to AI Safety, Ethics, and Society through CRC Press (Taylor & Francis imprint), DOI: 10.1201/9781003530336.10 The book is available as an open-access monograph on aisafetybook.com and in print, with an arXiv preprint posted November 2024 (arXiv: 2411.01042).10 It is targeted at upper-level undergraduate and postgraduate students and covers technical safety concepts, ethics, and a governance chapter spanning AI policy variables including the distribution of benefits, access to AI, and roles of companies, governments, and international bodies.10 CAIS reported that an associated online course launched in 2024 enrolled 240 participants.25
Policy Engagement and Advocacy
In February 2022, Hendrycks co-authored Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks (arXiv: 2206.08966) with Anthony M. Barrett, Jessica Newman, and Brandie Nonnecke, submitted to NIST as input to inform the AI Risk Management Framework (AI RMF).32
The CAIS Action Fund, a related 501(c)(4) organization, co-sponsored California SB 1047 alongside Economic Security Action California and Encode Justice, organized a joint letter signed by more than 80 technology organizations asking Congress to fund NIST AI work, and advocated for $10 million in funding for the U.S. AI Safety Institute.2133 The Action Fund was formally launched at a Capitol Hill event in July 2024 at which Senator Brian Schatz (D-HI) and Representative French Hill (R-AR) appeared, attended by multiple members of Congress.21
California SB 1047: SB 1047 would have required AI developers to test frontier models for hazardous capabilities and take steps to mitigate catastrophic risks. Governor Gavin Newsom vetoed the bill in September 2024, stating in his veto message that the bill was "well-intentioned" but expressed concern that it did not take into account deployment context or whether systems involve high-risk environments.34 Supporters of the bill included CAIS, Elon Musk, the Los Angeles Times editorial board, and Anthropic; opponents included Meta, OpenAI, and House Speaker Nancy Pelosi, who argued it would hinder innovation.34 Nathan Calvin, Senior Policy Counsel at CAIS Action Fund, stated that the organization was "disappointed by Governor Newsom's decision to veto this urgent and common sense safety bill."33 Hendrycks, writing on X following the veto, stated that although the outcome was "disappointing," the debate had "begun moving the conversation about AI safety into the mainstream, where it belongs," and added that the bill "has revealed that some industry calls for responsible AI are nothing more than PR aircover for their business and investment strategies."35 Newsom indicated he would work with the legislature, federal partners, and technology experts to develop more empirically informed AI safety regulations.34
Hendrycks has also given public presentations and media interviews explaining AI risk concerns to non-technical audiences. CAIS has engaged with bodies involved in developing the EU AI Act and provided technical input to legislative discussions in the United States.
Stated Positions on AI Risk
Hendrycks has publicly positioned certain AI risks as warranting priority attention. The May 2023 statement he coordinated explicitly compared mitigating AI extinction risk to addressing pandemics and nuclear war in terms of global priority.8
In a 2025 interview with Lawfare Media, Hendrycks stated that AGI is "not something that's very far off, but potentially on the horizon," adding that "even in the next few years, you could get AGI, so to speak," and emphasized "the trajectory that we're on rather than the current capabilities."36 Separately, on X in 2025, he proposed a testable definition of AGI based on Cattell-Horn-Carroll (CHC) theory of human intelligence — "an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult" — and assessed, using this self-constructed metric, that GPT-4 (2023) was approximately 27% of the way to that threshold and GPT-5 (2025) approximately 58%.37 These percentage estimates are Hendrycks' own assessments using his proposed CHC-based definition; the framework has not been independently validated and the source is a social media post rather than a peer-reviewed publication.
In March 2025, he co-authored Superintelligence Strategy (arXiv: 2503.05628) with Eric Schmidt and Alexandr Wang.11 The paper introduces a deterrence concept called Mutual Assured AI Malfunction (MAIM), under which any state's aggressive bid for unilateral AI dominance would be met with preventive sabotage by rivals. The paper argues against a Manhattan Project-style acceleration toward AGI and proposes a three-part framework: deterrence (MAIM), nonproliferation, and competitiveness. It defines superintelligence as AI that "would decisively surpass the world's best individual experts in nearly every intellectual domain" and identifies hacking, virology, and autonomous AI R&D as the safety-relevant cognitive domains of greatest concern.11 The paper drew responses from multiple commentators; RAND Corporation described it as "an important new report" and "a critical contribution" to AI policy debate, while also raising questions about whether the MAIM framework is consistent with private-sector-driven AI development.38 Critics on LessWrong argued that the MAIM analogy to Mutually Assured Destruction (MAD) fails because the potentially enormous payoff of achieving superintelligence first may destabilize rather than stabilize deterrence dynamics.39 In a September 2025 response piece, Hendrycks and co-author Adam Khoja clarified that the paper draws a "pedagogical parallel" between MAIM and MAD but acknowledged the structures differ significantly, noting that MAD is built around nuclear retaliation whereas MAIM involves AI preemption.40
His activities through CAIS reflect a combination of intervention approaches:
- Technical research aimed at improving system safety and evaluation
- Policy engagement with government bodies and international organizations
- Public communication intended to broaden awareness of AI risk concerns, according to CAIS's stated goals15
- Compute governance work exploring hardware-level interventions
CAIS Programs and Activities
Research
CAIS conducts and supports research in several areas:
Technical Safety: Work on robustness, alignment techniques, and evaluation methodologies for AI systems. Research includes both empirical studies of current systems and development of new safety methods.
Compute Governance: Investigation of interventions based on the hardware supply chain for AI systems, including tracking of compute resources and potential international coordination mechanisms.
ML Safety Education: CAIS developed curriculum materials for teaching machine learning safety concepts, announced publicly in 2021.23 The course is described on course.mlsafety.org as "an advanced course covering empirical directions to reduce AI x-risk," with a full syllabus, lectures, readings, and assignments available publicly.41 Hendrycks worked on the course for approximately eight months before its launch and intended it to serve as a default resource for ML researchers interested in AI safety as well as undergraduates beginning safety research.23 A 2024 online course associated with CAIS's textbook enrolled 240 participants.25
Advocacy and Field-Building
CAIS has organized workshops, maintained networks of researchers working on safety-related topics, and engaged with policymakers. As noted above, the organization has co-authored NIST framework input, co-sponsored California SB 1047 through the CAIS Action Fund (which was subsequently vetoed by Governor Newsom in September 2024), and organized the 2024 Capitol Hill launch event with members of Congress.213242
Perspectives and Debates
Hendrycks' approach to AI safety places catastrophic and existential risk scenarios among its top concerns. This framing is part of ongoing debates within the AI research community about how to allocate attention and resources across different categories of AI-related concerns.
Critiques of Risk Framing
Critics, including researchers focused on near-term AI harms, have argued that emphasis on long-term or extinction-level risks may draw resources and attention away from documented harms related to bias, fairness, and misuse of existing AI systems. Researchers at organizations such as the AI Now Institute and the Distributed AI Research Institute (DAIR) have argued in published work that speculative long-term risk framing can function to divert regulatory attention from present harms; these critiques are directed at the broader x-risk research community rather than specifically at CAIS.8 Others have questioned whether the extinction risk framing is supported by available evidence, and have noted that the brief May 2023 statement was signed by researchers with substantially differing views on what the appropriate response to AI risk should be.8
Allegations Regarding Conflicts of Interest
In 2024, Pirate Wires, an online publication with a stated skepticism toward the effective altruism ecosystem, reported that Hendrycks had co-founded Gray Swan — an AI safety compliance startup — while CAIS Action Fund was co-sponsoring SB 1047, which would create AI compliance requirements. The report alleged the two activities created a conflict of interest.43 Gray Swan's official statement said the company "explicitly does not hold any position on SB 1047" and that its products and services "are not and will not be designed to satisfy any of the auditing requirements proposed in the bill."44 Hendrycks described the Pirate Wires article as "bad-faith gotcha journalism" on X, stating that Gray Swan was not designed to offer the type of audits SB 1047 would require, and subsequently said he divested his equity stake in the company "in order to send a clear signal."45
Critiques of MMLU
The MMLU benchmark, one of Hendrycks' most cited technical contributions, has attracted specific methodological criticism: saturation (leading models reaching the 86–89% accuracy range with limited differentiation), prompt sensitivity (score variance of up to 10–13 percentage points depending on methodology), and a documented error rate of approximately 6.49% in ground-truth answers identified by MMLU-Redux (2024).2930
Broader Context
Hendrycks has maintained that catastrophic risks warrant prioritization while not dismissing other AI safety concerns. The design of CAIS programs reflects this prioritization, with research and advocacy efforts concentrated on scenarios involving large-scale harm. The field of AI safety research contains diverse perspectives on which problems are most important, what methods are most promising, and how technical research should relate to policy and governance questions; Hendrycks' work represents one approach within this broader landscape.
Selected Publications
| Title | Authors | Venue | Year | Identifier |
|---|---|---|---|---|
| Gaussian Error Linear Units (GELUs) | Hendrycks, Gimpel | arXiv preprint | 2016 | arXiv:1606.08415 |
| A Baseline for Detecting Misclassified and Out-of-Distribution Examples | Hendrycks, Gimpel | ICLR 2017 | 2017 | arXiv:1610.02136 |
| Measuring Massive Multitask Language Understanding | Hendrycks, Burns, Basart, Zou, Mazeika, Song, Steinhardt | ICLR 2021 | 2021 | arXiv:2009.03300 |
| Aligning AI With Shared Human Values | Hendrycks, Burns, Basart, Critch, Li, Song, Steinhardt | ICLR 2021 | 2021 | arXiv:2008.02275 |
| Natural Adversarial Examples | Hendrycks, Zhao, Basart, Steinhardt, Song | CVPR 2021 | 2021 | arXiv:1907.07174 |
| Unsolved Problems in ML Safety | Hendrycks, Carlini, Schulman, Steinhardt | arXiv | 2021 | arXiv:2109.13916 |
| Actionable Guidance for High-Consequence AI Risk Management | Barrett, Hendrycks, Newman, Nonnecke | NIST AI RMF submission | 2022 | arXiv:2206.08966 |
| Introduction to AI Safety, Ethics, and Society | Hendrycks | CRC Press (Taylor & Francis) | 2024 | DOI:10.1201/9781003530336 |
| Superintelligence Strategy | Hendrycks, Schmidt, Wang | arXiv | 2025 | arXiv:2503.05628 |
Footnotes
-
Center for AI Safety – Wikipedia. https://en.wikipedia.org/wiki/Center_for_AI_Safety ↩ ↩2 ↩3
-
Dan Hendrycks – Schmidt Sciences Grantee Profile. https://www.schmidtsciences.org/grantee/dan-hendrycks/ ↩ ↩2 ↩3
-
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. ICLR 2021. arXiv:2009.03300. https://arxiv.org/abs/2009.03300 ↩ ↩2 ↩3
-
Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D., & Steinhardt, J. (2021). Aligning AI With Shared Human Values. ICLR 2021. arXiv:2008.02275. https://arxiv.org/abs/2008.02275 ↩ ↩2
-
Machine Learning Safety — UC Berkeley EECS Dissertation. UCB/EECS-2022-253. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-253.html ↩ ↩2 ↩3
-
Dan Hendrycks and Kevin Gimpel, "Gaussian Error Linear Units (GELUs)," arXiv:1606.08415, submitted June 2016, revised June 2023. https://arxiv.org/abs/1606.08415 ↩ ↩2 ↩3 ↩4
-
Dan Hendrycks and Kevin Gimpel, "A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks," ICLR 2017. arXiv:1610.02136. Semantic Scholar citation count: approximately 3,887 citations including ~781 highly influential citations. https://arxiv.org/abs/1610.02136 ↩ ↩2 ↩3
-
CBC News. "Artificial intelligence poses 'risk of extinction,' tech execs and experts warn." May 30–31, 2023. https://www.cbc.ca/news/world/artificial-intelligence-extinction-risk-1.6859118; CAIS AI Risk Statement press release: https://safe.ai/work/press-release-ai-risk ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
-
TIME. "Dan Hendrycks: The 100 Most Influential People in AI 2023." 2023. https://time.com/collection/time100-ai/6309050/dan-hendrycks/ ↩ ↩2 ↩3 ↩4
-
Hendrycks, D. (2024). Introduction to AI Safety, Ethics, and Society. CRC Press (Taylor & Francis). DOI:10.1201/9781003530336. https://www.aisafetybook.com; arXiv:2411.01042. ↩ ↩2 ↩3 ↩4
-
Hendrycks, D., Schmidt, E., & Wang, A. (2025). Superintelligence Strategy: Expert Version. arXiv:2503.05628. https://arxiv.org/abs/2503.05628 ↩ ↩2 ↩3
-
Dan Hendrycks – Personal Academic Page / CV, UC Berkeley. https://people.eecs.berkeley.edu/~hendrycks/; CV: https://people.eecs.berkeley.edu/~hendrycks/CV.pdf ↩ ↩2
-
Dan Hendrycks, X profile (@hendrycks / @DanHendrycks), accessed 2024. https://x.com/hendrycks ↩
-
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., & Song, D. (2021). Natural Adversarial Examples. CVPR 2021, pp. 15262–15271. arXiv:1907.07174. https://openaccess.thecvf.com/content/CVPR2021/html/Hendrycks_Natural_Adversarial_Examples_CVPR_2021_paper.html ↩ ↩2 ↩3
-
Inside Philanthropy, "Open Philanthropy Is Now Coefficient Giving. Here's What Has (and Hasn't) Changed," November 2025. https://www.insidephilanthropy.com/home/open-philanthropy-is-now-coefficient-giving-heres-what-has-and-hasnt-changed ↩
-
Open Philanthropy. Center for AI Safety – General Support (2022). https://www.openphilanthropy.org/grants/center-for-ai-safety-general-support/ ↩
-
Open Philanthropy. Center for AI Safety – General Support (2023). https://www.openphilanthropy.org/grants/center-for-ai-safety-general-support-2023/ ↩
-
Open Philanthropy. Center for AI Safety – Philosophy Fellowship and NeurIPS Prizes. https://www.openphilanthropy.org/grants/center-for-ai-safety-philosophy-fellowship/ ↩
-
80,000 Hours. "It looks like there are some good funding opportunities in AI safety right now." January 2025. https://80000hours.org/2025/01/it-looks-like-there-are-some-good-funding-opportunities-in-ai-safety-right-now/ ↩
-
Center for AI Safety Action Fund – 2024 Year in Review. CAIS Newsletter. https://newsletter.safe.ai/p/aisn-45-center-for-ai-safety-2024; Washington AI Network, July 29, 2024. https://washingtonainetwork.com/2024/07/29/center-for-ai-safety-hosts-dc-launch-event-featuring-cais-dan-hendrycks-jaan-tallinn-sen-brian-schatz-rep-french-hill-and-cnns-pamela-brown/ ↩ ↩2 ↩3 ↩4
-
Bloomberg, "FTX Is Probing $6.5 Million Paid to Leading Nonprofit Group on AI Safety," October 25, 2023. https://www.bloomberg.com/news/articles/2023-10-25/ftx-probing-6-5-million-paid-to-leading-ai-safety-nonprofit ↩
-
Hendrycks, D. / CAIS. "Announcing the Introduction to ML Safety Course." LessWrong, 2021. https://www.lesswrong.com/posts/4F8Bg8Z5cePTBofzo/announcing-the-introduction-to-ml-safety-course ↩ ↩2 ↩3
-
Center for AI Safety 2023 Year in Review. CAIS Newsletter, December 21, 2023. https://newsletter.safe.ai/p/aisn-28-center-for-ai-safety-2023 ↩
-
Center for AI Safety, "AISN #45: Center for AI Safety 2024 Year in Review," December 19, 2024. https://newsletter.safe.ai/p/aisn-45-center-for-ai-safety-2024 ↩ ↩2 ↩3
-
Center for AI Safety, "Philosophy Fellowship 2023," CAIS official page. https://safe.ai/work/philosophy-fellowship ↩
-
Springer/Philosophical Studies, "AI Safety Special Issue," edited by Cameron Kirk-Giannini and Dan Hendrycks. https://link.springer.com/collections/cadgidecih ↩
-
Wikipedia contributors, "MMLU," Wikipedia, updated December 2024/early 2025. https://en.wikipedia.org/wiki/MMLU ↩ ↩2 ↩3
-
MMLU – Wikipedia. https://en.wikipedia.org/wiki/MMLU; MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark. arXiv:2406.01574. https://arxiv.org/html/2406.01574v1 ↩ ↩2 ↩3 ↩4
-
Gema et al. "Are We Done with MMLU?" NAACL 2025. arXiv:2406.04127. https://arxiv.org/abs/2406.04127 ↩ ↩2
-
Hendrycks, D., Carlini, N., Schulman, J., & Steinhardt, J. (2021). Unsolved Problems in ML Safety. arXiv:2109.13916. https://arxiv.org/abs/2109.13916 ↩
-
Barrett, A.M., Hendrycks, D., Newman, J., & Nonnecke, B. (2022). Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks. Submitted to NIST AI RMF. arXiv:2206.08966. https://arxiv.org/abs/2206.08966 ↩ ↩2
-
California State Senate — Sen. Scott Wiener's office, "Senator Wiener Responds to Governor Newsom Vetoing Landmark AI Bill," September 2024. https://sd11.senate.ca.gov/news/senator-wiener-responds-governor-newsom-vetoing-landmark-ai-bill ↩ ↩2
-
Crowell & Moring LLP, "Gov. Newsom Vetoes AI Bill but Leaves the Door Open to Future CA Regulation," October 2024. https://www.crowell.com/en/insights/client-alerts/gov-newsom-vetoes-ai-bill-but-leaves-the-door-open-to-future-ca-regulation ↩ ↩2 ↩3
-
Space Daily (wire report), "California governor vetoes AI safety bill," September/October 2024. https://www.spacedaily.com/reports/California_governor_vetoes_AI_safety_bill_999.html ↩
-
Lawfare Media. "Lawfare Daily: Dan Hendrycks on National Security in the Age of Superintelligent AI." 2025. https://www.lawfaremedia.org/article/lawfare-daily--dan-hendrycks-on-national-security-in-the-age-of-superintelligent-ai ↩
-
Dan Hendrycks on X, 2025. https://x.com/DanHendrycks/status/1978828377269117007 ↩
-
RAND Corporation, "Seeking Stability in the Competition for AI Advantage," March 13, 2025. https://www.rand.org/pubs/commentary/2025/03/seeking-stability-in-the-competition-for-ai-advantage.html ↩
-
Eli Blee-Goldman, "The Jackpot Jinx (or why 'Superintelligence Strategy' is wrong)," LessWrong, March 10, 2025. https://www.lesswrong.com/posts/JF2JABmeM9opszwZA/the-jackpot-jinx-or-why-superintelligence-strategy-is-wrong ↩
-
Dan Hendrycks and Adam Khoja, "AI Deterrence Is Our Best Option," AI Frontiers, September 18, 2025. https://ai-frontiers.org/articles/ai-deterrence-is-our-best-option ↩
-
Pirate Wires, "The Conflict of Interest at the Heart of CA's AI Bill," 2024. https://www.piratewires.com/p/sb-1047-dan-hendrycks-conflict-of-interest ↩
-
Gray Swan AI, "Statement on SB-1047 and Founders," 2024. https://www.grayswan.ai/blog/sb1047 ↩
-
Dan Hendrycks, post on X, July 2024. https://x.com/DanHendrycks/status/1814006259571667045 ↩
References
Wikipedia's overview of the Center for AI Safety (CAIS), a nonprofit organization focused on reducing societal-scale risks from advanced AI systems. CAIS is known for publishing the 2023 statement on AI extinction risk signed by hundreds of leading AI researchers and for conducting technical safety research. The article covers the organization's founding, mission, key initiatives, and notable figures involved.
This paper systematically identifies and categorizes errors in the MMLU benchmark, finding that a substantial fraction of questions (e.g., 57% of Virology questions) contain ground truth errors. The authors introduce MMLU-Redux, a manually re-annotated subset of 3,000 questions, and show that these errors meaningfully distort LLM performance metrics and model rankings.
3Hendrycks, D., Schmidt, E., & Wang, A.arXiv·Dan Hendrycks, Eric Schmidt & Alexandr Wang·2025·Paper▸
This paper by Hendrycks, Schmidt, and Wang proposes a comprehensive national security strategy for superintelligence—AI systems vastly superior to humans across cognitive tasks. The authors argue that rapid AI advances pose destabilizing geopolitical risks, including lowered barriers for catastrophic misuse by rogue actors and potential great-power conflict over AI dominance. They introduce Mutual Assured AI Malfunction (MAIM), a deterrence framework analogous to nuclear MAD where states prevent rivals' unilateral AI dominance through preventive sabotage. The paper outlines a three-part strategy combining deterrence, nonproliferation to hostile actors, and competitive strengthening through AI development.
4Barrett, A.M., Hendrycks, D., Newman, J., & Nonnecke, B.arXiv·Anthony M. Barrett, Dan Hendrycks, Jessica Newman & Brandie Nonnecke·2022·Paper▸
This paper translates high-level AI safety principles into concrete risk management practices to support NIST's AI Risk Management Framework (AI RMF). It provides recommendations for identifying unintended uses and misuses, incorporating catastrophic-risk factors into assessments, addressing human rights harms, and improving risk reporting. The work aims to establish practical standards for managing existential and societal-scale AI risks within governance frameworks.
A structured university-level course on machine learning safety developed by the Center for AI Safety, covering topics from robustness and anomaly detection to alignment and systemic safety. The course includes lecture recordings, slides, notes, and coding assignments across modules on safety engineering, robustness, monitoring, alignment, and emerging risks.
Hendrycks and Gimpel (2017) propose a simple but effective baseline for detecting misclassified and out-of-distribution (OOD) inputs using maximum softmax probabilities. The core finding is that correctly classified in-distribution examples tend to produce higher maximum softmax probabilities than errors or OOD inputs. The paper establishes benchmark tasks across vision, NLP, and speech, and shows the baseline can be improved, motivating further research.
7[2009.03300] Measuring Massive Multitask Language UnderstandingarXiv·Dan Hendrycks et al.·2020·Paper▸
Introduces the MMLU benchmark, a comprehensive evaluation suite covering 57 subjects across STEM, humanities, social sciences, and more, designed to measure breadth and depth of language model knowledge. The benchmark tests models from elementary to professional level and reveals significant gaps between human expert performance and state-of-the-art models at the time of publication. It became a standard benchmark for tracking LLM capability progress.
This paper introduces ETHICS, a benchmark dataset for evaluating language models' understanding of moral concepts across justice, well-being, duties, virtues, and commonsense morality. The dataset requires models to predict human moral judgments about diverse text scenarios, combining physical and social world knowledge with value judgments. The authors find that current language models demonstrate promising but incomplete ability to predict ethical judgments, suggesting that progress on machine ethics is achievable and could help align AI systems with human values.
9Unsolved Problems in ML SafetyarXiv·Dan Hendrycks, Nicholas Carlini, John Schulman & Jacob Steinhardt·2021·Paper▸
This paper presents a comprehensive roadmap for ML safety research, identifying four critical problem areas that the field must address as machine learning systems grow larger and are deployed in high-stakes applications. The authors categorize safety challenges into Robustness (withstanding hazards), Monitoring (identifying hazards), Alignment (reducing inherent model hazards), and Systemic Safety (reducing systemic hazards). By clarifying the motivation behind each problem and providing concrete research directions, the paper aims to guide the ML safety research community toward addressing emerging safety challenges posed by large-scale models.