Skip to content
Longterm Wiki

Center for AI Safety

Academic

Also known as: CAIS

Founded 2022 (4 years old)HQ: San Franciscosafe.aiWiki page →KB data →

The Center for AI Safety (CAIS) is a nonprofit organization that works to reduce societal-scale risks from AI. CAIS combines research, field-building, and public communication to advance AI safety. Co-founded by Dan Hendrycks (Executive Director) and Oliver Zhang (Managing Director) in 2022.

Revenue
$10.2 million
as of 2024
Total Funding Raised
$33 million
as of 2025

Key Metrics

Revenue (ARR)

$10M2024
Revenue (ARR) chart. Annual run rate: $6.7M in 2022 to $10M in 2024.$0$4.6M$9.3M$14M$19M202220232024

Facts

15
Financial
Grant Received$1.1 million
Total Funding Raised$33 million
Net Assets$11.6 million
Annual Expenses$7.2 million
Revenue$10.2 million
General
Websitehttps://www.safe.ai/
Organization
HeadquartersSan Francisco
Founded Date2022
Other
Key PersonDan Hendrycks
CompensationDan Hendrycks takes $1 annual salary as Executive Director
PublicationThe WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning — benchmark for evaluating dual-use AI capabilities in biosecurity, cybersecurity, and chemical weapons
InfrastructureCompute cluster with 80+ NVIDIA A100 GPUs available for AI safety researchers
ProgramML Safety Scholars — educational program training hundreds of students in AI safety fundamentals. Includes online course, reading groups, and mentorship.
Board MemberJaan Tallinn
CampaignStatement on AI Risk (May 2023): one-sentence statement 'Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.' Signed by 350+ AI leaders including Geoffery Hinton, Demis Hassabis, Sam Altman, and Dario Amodei.

Other Data

Publications
12 entries
TitlePublicationTypeAuthorsUrlPublishedDateIsFlagship
Humanity's Last ExampaperLong Phan, Alice Gatti, Ziwen Han, Nathaniel Li et al.arxiv.org2025-01
Introduction to AI Safety, Ethics, and SocietybookDan Hendrycksaisafetybook.com2024-06
The WMDP Benchmark: Measuring and Reducing Malicious Use With UnlearningpaperNathaniel Li, Alexander Pan, Anjali Gopal et al.wmdp.ai2024
Superintelligence StrategyreportDan Hendrycks, Eric Schmidt, Alexandr Wangnationalsecurity.ai2024
Improving Alignment and Robustness with Circuit BreakerspaperAndy Zou, Long Phan, Justin Wang et al.arxiv.org2024
HarmBench: A Standardized Evaluation Framework for Automated Red TeamingpaperMantas Mazeika, Long Phan, Xuwang Yin et al.harmbench.org2024
Representation Engineering: A Top-Down Approach to AI TransparencypaperAndy Zou, Long Phan, Sarah Chen et al.arxiv.org2023-10
An Overview of Catastrophic AI RiskspaperDan Hendrycks, Mantas Mazeika, Thomas Woodsidearxiv.org2023-06
Statement on AI Riskpolicy-briefCAISaistatement.com2023-05
Universal and Transferable Adversarial Attacks on Aligned Language ModelspaperAndy Zou, Zifan Wang, Nicholas Carlini et al.llm-attacks.org2023
Unsolved Problems in ML SafetypaperDan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardtarxiv.org2021-09
Measuring Massive Multitask Language Understanding (MMLU)paperDan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardtarxiv.org2020-09

Divisions

6

Related Wiki Pages

Top Related Pages

Approaches

Capability Unlearning / RemovalMAIM (Mutually Assured AI Malfunction)AI AlignmentCorporate AI Safety Responses

Analysis

AI Compute Scaling MetricsAI Safety Intervention Effectiveness MatrixAI Uplift Assessment Model

Policy

Safe and Secure Innovation for Frontier Artificial Intelligence Models Act

Organizations

AnthropicCenter for Human-Compatible AICenter for AI Safety Action FundGoogle DeepMindUS AI Safety InstituteRedwood Research

Other

Geoffrey HintonStuart Russell

Concepts

AGI Timeline

Key Debates

Is AI Existential Risk Real?

Risks

AI-Induced Irreversibility

Historical

The MIRI Era