Center for AI Safety
AcademicAlso known as: CAIS
The Center for AI Safety (CAIS) is a nonprofit organization that works to reduce societal-scale risks from AI. CAIS combines research, field-building, and public communication to advance AI safety. Co-founded by Dan Hendrycks (Executive Director) and Oliver Zhang (Managing Director) in 2022.
Key Metrics
Revenue (ARR)
Facts
15Other Data
| Title | PublicationType | Authors | Url | PublishedDate | IsFlagship |
|---|---|---|---|---|---|
| Humanity's Last Exam | paper | Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li et al. | arxiv.org | 2025-01 | ✓ |
| Introduction to AI Safety, Ethics, and Society | book | Dan Hendrycks | aisafetybook.com | 2024-06 | ✓ |
| The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | paper | Nathaniel Li, Alexander Pan, Anjali Gopal et al. | wmdp.ai | 2024 | ✓ |
| Superintelligence Strategy | report | Dan Hendrycks, Eric Schmidt, Alexandr Wang | nationalsecurity.ai | 2024 | ✓ |
| Improving Alignment and Robustness with Circuit Breakers | paper | Andy Zou, Long Phan, Justin Wang et al. | arxiv.org | 2024 | — |
| HarmBench: A Standardized Evaluation Framework for Automated Red Teaming | paper | Mantas Mazeika, Long Phan, Xuwang Yin et al. | harmbench.org | 2024 | ✓ |
| Representation Engineering: A Top-Down Approach to AI Transparency | paper | Andy Zou, Long Phan, Sarah Chen et al. | arxiv.org | 2023-10 | ✓ |
| An Overview of Catastrophic AI Risks | paper | Dan Hendrycks, Mantas Mazeika, Thomas Woodside | arxiv.org | 2023-06 | — |
| Statement on AI Risk | policy-brief | CAIS | aistatement.com | 2023-05 | ✓ |
| Universal and Transferable Adversarial Attacks on Aligned Language Models | paper | Andy Zou, Zifan Wang, Nicholas Carlini et al. | llm-attacks.org | 2023 | ✓ |
| Unsolved Problems in ML Safety | paper | Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt | arxiv.org | 2021-09 | ✓ |
| Measuring Massive Multitask Language Understanding (MMLU) | paper | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt | arxiv.org | 2020-09 | ✓ |
Divisions
6Related Wiki Pages
Top Related Pages
Representation Engineering
A top-down approach to understanding and controlling AI behavior by reading and modifying concept-level representations in neural networks, enablin...
Power-Seeking AI
Formal theoretical analysis demonstrates why optimal AI policies tend to acquire power (resources, influence, capabilities) as an instrumental goal.
Existential Risk from AI
Hypotheses concerning risks from advanced AI systems that some researchers believe could result in human extinction or permanent global catastrophe...
Dan Hendrycks
Director of CAIS, focuses on catastrophic AI risk reduction through research, education, and policy advocacy
Pause Advocacy
Advocacy for slowing or halting frontier AI development until adequate safety measures are in place. Analysis suggests 15-40% probability of meanin...