Skip to content
Longterm Wiki
Navigation
Updated 2026-04-02HistoryData
Page StatusContent
Edited 3 days ago3.0k words45 backlinksUpdated every 3 weeksDue in 3 weeks
42QualityAdequate •88.5ImportanceHigh17.5ResearchMinimal
Content8/13
SummaryScheduleEntityEdit history1Overview
Tables6/ ~12Diagrams0/ ~1Int. links61/ ~24Ext. links21/ ~15Footnotes0/ ~9References16/ ~9Quotes0Accuracy0RatingsN:2.5 R:4 A:3.5 C:5.5Backlinks45
Change History1
Surface tacticalValue in /wiki table and score 53 pages7 weeks ago

Added `tacticalValue` to `ExploreItem` interface, `getExploreItems()` mappings, the `/wiki` explore table (new sortable "Tact." column), and the card view sort dropdown. Scored 49 new pages with tactical values (4 were already scored), bringing total to 53.

sonnet-4 · ~30min

Issues2
QualityRated 42 but structure suggests 93 (underrated by 51 points)
Links6 links could use <R> components

Center for AI Safety (CAIS)

Academic

Center for AI Safety

CAIS is a nonprofit research organization founded by Dan Hendrycks that has distributed compute grants to researchers, published technical AI safety papers including the representation engineering and MACHIAVELLI benchmark papers, and organized the May 2023 Statement on AI Risk signed by over 350 AI researchers and industry leaders. The organization focuses on technical safety research, field-building, and policy communication.

TypeAcademic
Founded2022
LocationSan Francisco, CA
Websitesafe.ai
Related
Concepts
Existential Risk from AI
Risks
Power-Seeking AI
Organizations
Anthropic
People
Dan HendrycksOliver Zhang
3k words · 45 backlinks

Overview

The Center for AI Safety (CAIS) is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication. Founded by Dan Hendrycks, CAIS received substantial public attention in May 2023 when it organized a one-sentence statement on AI extinction risk that attracted signatures from over 350 AI researchers and industry figures, including several Turing Award recipients and heads of major AI laboratories.

CAIS operates across three areas: technical research on AI alignment and robustness, grant and fellowship programs intended to grow the AI safety research community, and communication efforts aimed at policymakers and the public. Its technical output includes work on Representation Engineering and the MACHIAVELLI benchmark for evaluating goal-directed behavior in AI systems. The organization has received substantial funding from EA-aligned sources including Coefficient Giving (formerly Open Philanthropy), a funding relationship that is relevant context for assessing its research priorities and institutional positioning.

CAIS occupies a distinct niche in the AI safety ecosystem: unlike academic centers such as CHAI or research-focused organizations like MIRI, it combines original technical research with explicit field-building and public communication goals. Critics have questioned whether its emphasis on long-run extinction risk is appropriately calibrated relative to near-term AI harms, and whether EA-concentrated funding in this space creates ideological homogeneity in safety research priorities. These debates are discussed in the Critiques and Limitations section below.

Organizational Background

CAIS was established as a nonprofit research organization with the goal of filling a perceived gap between technical AI safety research and broader scientific and public awareness of AI risks. Dan Hendrycks, who completed his PhD at UC Berkeley, co-founded CAIS with Oliver Zhang to provide infrastructure — compute grants, fellowships, educational resources, and policy engagement — that individual academic researchers lacked access to.

The organization's theory of change rests on several linked assumptions: that AI systems pose meaningful risks of societal-scale harm, including possible catastrophic outcomes; that the current period is important for establishing safety-relevant research norms and technical methods; and that field-building activities (funding researchers, running educational programs, facilitating policy engagement) will increase the probability of good outcomes by growing and coordinating the safety research community. Whether these assumptions are well-founded is contested, and the organization's critics have argued that the extinction-risk framing in particular overstates speculative long-run risks relative to observable near-term harms.

CAIS is legally structured as a nonprofit (EIN: 88-1751310). Its primary disclosed funders include Coefficient Giving and the Survival and Flourishing Fund. Per IRS Form 990 filings available on ProPublica, CAIS reported total revenue of $6.7M (2022), $16.1M (2023), and $10.2M (2024), for cumulative funding of approximately $33M since founding.

Funding

CAIS's primary disclosed funders have included Coefficient Giving (formerly Open Philanthropy), a philanthropic organization closely associated with the effective altruism movement. This funding relationship is material context for interpreting the organization's research agenda: Coefficient Giving has historically prioritized long-run catastrophic and extinction-level AI risk over near-term AI harms, and CAIS's framing broadly reflects this prioritization.

Per IRS Form 990 filings (ProPublica), CAIS reported total revenue of $6.7M (2022), $16.1M (2023), and $10.2M (2024). Major known grant sources include Open Philanthropy (≈$10.6M across 2022-2023 general support grants), SFF (≈$3.8M in 2024-2025), Good Ventures Foundation ($1.9M in 2024), and Founders Pledge ($0.9M in 2024). Total expenses were $7.2M in 2024, with total assets of $12.6M.

The concentration of AI safety funding through EA-aligned funders including Coefficient Giving (formerly Open Philanthropy) has been noted by critics as a potential source of ideological constraint on safety research priorities — organizations dependent on this funding may face implicit pressure to prioritize framings and research directions consistent with funder worldviews. CAIS has not publicly addressed this critique directly.

Key Research Areas

Technical Safety Research

Research DomainKey ContributionsNotes
Representation EngineeringMethods for reading and steering model internal representationsPublished 2023; independent replication and scalability to frontier models remains an open research question
Safety BenchmarksMACHIAVELLI benchmark for evaluating goal-directed and deceptive behaviorCited in subsequent research; the extent to which it has been formally integrated into evaluation pipelines at Anthropic or OpenAI is not publicly documented
Adversarial RobustnessEvaluation protocols and defense mechanismsPart of the broader Adversarial Robustness research agenda
Alignment FoundationsConceptual frameworks and problem taxonomies for AI safetyIncluding the "Unsolved Problems in ML Safety" paper (2022)

Major Publications & Tools

  • Representation Engineering: A Top-Down Approach to AI Transparency (2023) — Methods for understanding and influencing AI decision-making by working with internal representations rather than input-output behavior alone
  • Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior (2023) — Introduces the MACHIAVELLI benchmark for evaluating whether AI agents pursue goals through unethical means in text-based game environments
  • Unsolved Problems in ML Safety (2022) — A taxonomy of open technical challenges in machine learning safety, intended partly as a research agenda for the field
  • Measuring Mathematical Problem Solving With the MATH Dataset (2021) — A benchmark for evaluating AI mathematical reasoning, authored by Dan Hendrycks and collaborators during his PhD at UC Berkeley; this paper predates CAIS's founding and is a product of Hendrycks's academic research rather than an organizational output of CAIS

Citation counts for these papers (figures such as "200+", "50+", "30+") previously appeared on this page without sourced methodology. Readers seeking current citation data should consult Google Scholar or Semantic Scholar directly.

Field-Building Programs

CAIS runs several programs intended to grow the population of researchers working on AI safety. The term "field-building" refers to activities designed to increase the size, diversity, and coordination of a research community — in this case, researchers focused on technical and governance aspects of AI safety.

Grant Programs

ProgramReported ScaleDescriptionTimeline
Compute Grants$2M+ distributed; number of recipients reported variously as 100+ and 200+ in different CAIS materials — figure unverifiedProvides compute resources to researchers working on safety-relevant projects2022–present
ML Safety Scholars63 graduates in the Summer 2022 cohortStructured program for early-career researchers entering the AI safety field2021–present (pre-dates CAIS's 2022 founding; originated as an independent initiative)
Research FellowshipsAmount not publicly disclosedFellowships placing researchers at academic and research institutions2022–present
AI Safety CampParticipant count not publicly disclosedCollaborative program supporting international research teams2020–present (pre-dates CAIS's 2022 founding; originated as an independent initiative)

Note: Quantitative figures in this table are drawn from CAIS's own communications and have not been independently verified. The ML Safety Scholars program was introduced in 2021 as an initiative led by Dan Hendrycks and collaborators during his time at UC Berkeley, and was later absorbed into CAIS's organizational umbrella.

Institutional Partnerships

  • Academic Collaborations: CAIS’s compute cluster supports researchers from UC Berkeley, Stanford, University of Cambridge, ETH Zurich, and other institutions. Collaborative research has included work with Carnegie Mellon University on adversarial attacks on large language models.
  • Industry Engagement: Research interactions with Anthropic and Google DeepMind have been reported in CAIS communications, though specific partnership details are not publicly documented.
  • Policy Connections: CAIS’s Action Fund engages in AI policy advocacy, including sponsoring California SB 1047. Specific briefings with individual legislative bodies are not independently documented.

Statement on AI Risk (2023)

In May 2023, CAIS published and circulated the Statement on AI Risk, a single sentence co-signed by over 350 AI researchers and industry figures:

"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

The statement was covered widely in major news outlets and was cited in subsequent policy discussions, including in the context of UK and US government AI strategies. The official signatory list is available at safe.ai; the figure of 350+ is drawn from that list, though the precise count at any given time may vary as signatories are added.

Signatory Groups

CategoryNotable SignatoriesDescription
Turing Award RecipientsGeoffrey Hinton, Yoshua Bengio, Stuart RussellRecipients of computing's highest recognition who signed the statement
Industry ExecutivesSam Altman (OpenAI), Dario Amodei (Anthropic), Demis Hassabis (DeepMind)CEOs of major AI laboratories
Policy and Governance ResearchersHelen Toner, Allan Dafoe, Gillian HadfieldResearchers working on AI governance and policy
ML/AI Researchers300+ researchers across academia and industryResearchers who signed as individuals, not representing institutional positions

The statement's reception was not uniformly positive within the AI research community. A number of prominent ML researchers declined to sign or publicly criticized the statement's framing. Critics raised several concerns: that the one-sentence format was too vague to convey meaningful technical content; that equating AI risk with nuclear war risk was unsupported by available evidence; that the extinction framing could distract attention and resources from observable near-term harms from AI systems (such as bias, surveillance, and labor displacement); and that the statement's signatories were not uniformly working on extinction-risk problems, making it a weak signal of scientific consensus. Timnit Gebru criticized it for elevating speculative extinction risks while being promoted by "the same people who have poured billions of dollars into these companies." Human Rights Watch argued that scientists should focus on the known risks of AI instead of speculative future dangers. Emile Torres and Gebru argued the statement may be motivated by TESCREAL ideologies.

Proponents argued that the statement served a legitimate coordination function: making it socially acceptable for researchers to discuss catastrophic risk publicly, signaling to policymakers that risk concerns were not fringe views, and creating a reference point for subsequent regulatory discussions. Whether the statement's net effect on AI policy and research prioritization was positive is a matter of ongoing debate.

The statement's impact on specific policy documents — including mentions in UK AI Safety Institute and US AI Safety Institute contexts — has been cited by CAIS, though the causal relationship between the statement and any particular policy outcome is difficult to establish.

Critiques and Limitations

Criticism of Extinction-Risk Framing

The most substantive criticism of CAIS's work concerns its central framing of AI extinction risk as a near-term policy priority. Critics from several directions have argued:

  • Near-term displacement effect: Emphasizing speculative long-run extinction risk may draw funding, talent, and policy attention away from near-term AI harms — discrimination in algorithmic decision-making, AI-enabled surveillance, labor market disruption, and misinformation — that are currently affecting people. Researchers associated with the AI ethics and fairness communities, including Timnit Gebru and the DAIR Institute, have made this argument most consistently.
  • Epistemic status of extinction claims: The probability of AI-caused human extinction within policy-relevant timeframes is highly uncertain, and critics have argued that treating it as a "global priority alongside pandemics and nuclear war" involves large unjustified inferential steps. Some ML researchers have noted that the mechanisms by which current or near-term AI systems could pose extinction-level risks are not specified with sufficient precision to evaluate.
  • Ideological concentration: CAIS's alignment with EA-associated funders and the broader longtermist intellectual tradition has led critics to argue that its research agenda reflects a particular philosophical worldview rather than a neutral assessment of AI risk. This critique is not unique to CAIS — it applies to several EA-funded AI safety organizations — but it is relevant to assessing how to interpret CAIS's outputs.

Limitations of Specific Research

  • Representation Engineering scalability: The representation engineering paper introduced methods that work on models of a given scale; whether these methods generalize to frontier-scale models is an open question. A survey of representation engineering identifies challenges including performance degradation at scale, computational overhead, and reliability concerns regarding whether the correlations identified are causal.
  • Benchmark validity: A general concern in AI safety evaluation is whether constructed benchmarks (including MACHIAVELLI) capture risks that manifest in real deployment contexts. The MACHIAVELLI benchmark uses text-based game environments, and the extent to which performance on these environments predicts behavior in consequential real-world settings is not established.
  • Field-building outcome measurement: CAIS reports counts of researchers supported and grant dollars distributed, but does not publicly report outcome data for its programs — for example, where ML Safety Scholars alumni work subsequently, what research they produce, or whether compute grant recipients remain in safety research. Without outcome data, the field-building impact claims are difficult to evaluate independently.

Critiques of the 2023 Statement

Beyond the framing critiques noted above, several researchers argued that the statement's format — a single declarative sentence without methodology, evidence, or mechanism — made it unsuitable as a scientific communication and more akin to a public advocacy document. Others noted that some signatories are not primarily working on extinction-risk problems, which complicated interpretation of the statement as a signal of expert consensus on the technical merits of the extinction-risk hypothesis. See the Wikipedia article on the Statement on AI Risk for a summary of these critiques and specific critics.

Current Trajectory & Timeline

Research Roadmap

The following research priorities were described by CAIS as goals for 2024–2026. Actual outcomes against these goals have not been independently verified and are not currently documented on this page.

Priority AreaStated GoalsStatus
Representation EngineeringScale methods to frontier models; pursue industry adoption for safety evaluationOutcome unverified
Evaluation FrameworksDevelop comprehensive benchmark suite; establish standard evaluation protocolsOutcome unverified
Alignment MethodsProof-of-concept demonstrations; practical implementation workOutcome unverified
Policy ResearchTechnical governance recommendations; regulatory framework developmentOutcome unverified

A previously cited projection of "2x expansion by 2025" appeared in earlier versions of this page without a cited source. Whether this projection materialized has not been verified.

Organizational Scale

  • Staff: CAIS is organized into four functional teams (Research, Cloud and DevOps, Projects, Operations); total headcount is not publicly disclosed
  • Affiliates: The compute cluster supports 150+ active researchers across approximately 20 research labs
  • Budget: $10.2M revenue and $7.2M expenses in 2024 per IRS Form 990

Key Uncertainties & Research Cruxes

Technical Challenges

These represent genuine open questions in CAIS's research agenda, not settled conclusions:

  • Representation Engineering Scalability: Whether methods developed on mid-scale models transfer reliably to frontier-scale systems remains unclear. The gap between controlled research settings and deployment conditions is a known limitation.
  • Benchmark Validity: Whether evaluations like MACHIAVELLI capture risks that manifest in real deployment — rather than behavior specific to text-game environments — is unresolved. This is a field-wide challenge, not unique to CAIS.
  • Alignment Verification: There is no established consensus on how to verify that an AI system is successfully aligned with intended goals rather than passing evaluations through surface-level pattern matching.

Strategic Questions

  • Research vs. Policy Balance: CAIS allocates resources across technical research, field-building, and policy communication. The optimal allocation is not obvious, and different observers weight these activities differently based on their models of how AI safety progress happens.
  • Open vs. Closed Research: Publishing safety research openly makes it available to the broader community but may also inform adversarial actors. CAIS has not publicly articulated a detailed position on this tradeoff.
  • Timeline Assumptions: Appropriate research priorities depend substantially on assumptions about AGI timelines and the nature of AI risk. Researchers with shorter timelines and those focused on long-run speculative risk reach different conclusions about what work is most valuable now.
  • Near-term vs. Long-term Risk Balance: Whether resources spent on extinction-risk scenarios are appropriately calibrated relative to near-term AI harms is a live debate both within and outside the AI safety community, and CAIS's position at the long-run end of this spectrum is contested.

Leadership & Key Personnel

Key People

Dan Hendrycks
Executive Director
UC Berkeley PhD; previously at Google Brain
Mantas Mazeika
Research Director
University of Chicago; research focus on adversarial machine learning
Thomas Woodside
Policy Director
Former congressional staffer with technology policy background
Andy Zou
Research Scientist
CMU affiliation; research on jailbreaking and red-teaming

Note: Staff roles and affiliations reflect information available at time of last edit and may not reflect current positions. Andy Zou holds joint affiliation with CMU and CAIS; his primary institutional role should be verified against current sources.

Positioning Within the AI Safety Ecosystem

CAIS occupies a specific position within the broader AI safety research landscape that distinguishes it from peer organizations:

  • vs. MIRI: MIRI focuses almost exclusively on foundational theoretical alignment research and does not run field-building or public communication programs. CAIS's research is more empirical and its scope is broader institutionally.
  • vs. CHAI: CHAI (Center for Human-Compatible AI, UC Berkeley) is an academic center with a narrower research agenda centered on value alignment. CAIS has a more explicit field-building and policy communication mandate.
  • vs. Redwood Research: Redwood focuses on specific empirical safety problems with a small team; CAIS has a larger scope including grant programs and public communication.
  • vs. METR and ARC Evaluations: These organizations focus specifically on model evaluations and dangerous capability assessments. CAIS's evaluation work (MACHIAVELLI) is one component of a broader agenda.
  • vs. GovAI: GovAI focuses on AI governance and policy research. CAIS does policy communication but its primary identity is as a technical research organization.

The common thread across CAIS-adjacent organizations is EA-aligned funding, primarily from Coefficient Giving, which has led to criticisms that the AI safety field as constituted reflects the priorities of a relatively narrow philanthropic and ideological community rather than a broad scientific consensus.

Sources & Resources

Official Resources

TypeResourceDescription
Websitesafe.aiMain organization hub
ResearchCAIS PublicationsTechnical papers and reports
BlogCAIS BlogResearch updates and commentary
CoursesML Safety CourseEducational materials on machine learning safety

Key Research Papers

PaperYearDescription
Unsolved Problems in ML Safety2022Research agenda taxonomy; citation counts should be verified via Google Scholar or Semantic Scholar
MACHIAVELLI Benchmark2023Evaluation framework for goal-directed AI behavior in game environments
Representation Engineering2023Methods for reading and steering AI model internal representations
  • Technical Safety Research: MIRI, CHAI, Redwood Research
  • Evaluations: ARC Evaluations, METR
  • Policy Focus: GovAI, RAND Corporation
  • Industry Labs: Anthropic, OpenAI, Google DeepMind
  • Funders: Coefficient Giving

References

OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial general intelligence (AGI) benefits all of humanity. The homepage serves as a gateway to their research, products, and policy work spanning capabilities and safety.

★★★★☆
2Representation EngineeringCenter for AI Safety

This resource appears to be a blog post from the Center for AI Safety (CAIS) about Representation Engineering, a technique for understanding and controlling AI model internals. However, the page is currently unavailable (404 error), so the specific content cannot be assessed.

★★★★☆

A concise open letter coordinated by the Center for AI Safety stating that mitigating extinction-level risk from AI should be a global priority alongside pandemics and nuclear war. The statement has been signed by hundreds of leading AI researchers, executives, and public figures including Geoffrey Hinton, Yoshua Bengio, Sam Altman, and Demis Hassabis, lending significant institutional credibility to existential AI risk concerns.

★★★★☆

The Center for AI Safety (CAIS) publishes both technical and conceptual research aimed at mitigating high-consequence, societal-scale risks from AI. Their technical work focuses on safety benchmarks, robustness, machine ethics, and biosecurity, while their conceptual research draws on philosophy, safety engineering, and international relations to understand AI risk.

★★★★☆

This paper introduces representation engineering (RepE), a top-down approach to AI transparency that analyzes population-level representations in deep neural networks rather than individual neurons. Drawing from cognitive neuroscience, RepE provides methods for monitoring and manipulating high-level cognitive phenomena in large language models. The authors demonstrate that RepE techniques can effectively address safety-relevant problems including honesty, harmlessness, and power-seeking behavior, offering a promising direction for improving AI system transparency and control.

★★★☆☆
6Intro to ML Safety Coursecourse.mlsafety.org

A structured university-level course on machine learning safety developed by the Center for AI Safety, covering topics from robustness and anomaly detection to alignment and systemic safety. The course includes lecture recordings, slides, notes, and coding assignments across modules on safety engineering, robustness, monitoring, alignment, and emerging risks.

7MACHIAVELLI datasetarXiv·Alexander Pan et al.·2023·Paper

MACHIAVELLI is a benchmark dataset of 134 Choose-Your-Own-Adventure games with over 500,000 scenarios designed to evaluate whether AI agents naturally learn Machiavellian behaviors like power-seeking, deception, and ethical violations when trained to maximize reward. The authors use language models for automated scenario labeling and mathematize dozens of harmful behaviors to evaluate agents' tendencies. Their findings reveal a tension between reward maximization and ethical behavior, but demonstrate that agents can be steered toward less harmful actions through LM-based methods, suggesting that designing agents that are simultaneously safe and capable is achievable.

★★★☆☆

This paper introduces MATH, a benchmark of 12,500 competition mathematics problems with step-by-step solutions, revealing that large Transformer models achieve surprisingly low accuracy and that scaling alone is insufficient for mathematical reasoning. The authors also release an auxiliary pretraining dataset to aid mathematical learning. The work highlights a fundamental gap between current scaling trends and genuine mathematical reasoning ability.

★★★☆☆
9Center for AI Safety (CAIS) BlogCenter for AI Safety

The official blog of the Center for AI Safety (CAIS), a leading AI safety research organization focused on reducing societal-scale risks from advanced AI systems. The blog publishes research updates, policy commentary, and educational content on AI safety topics including existential risk, alignment, and governance.

★★★★☆

The Center for AI Safety (CAIS) is a research organization focused on mitigating catastrophic and existential risks from advanced AI systems. It conducts technical research, publishes surveys and statements, and supports field-building efforts across academia and industry. CAIS is notable for its broad coalition-building, including its widely-cited statement on AI extinction risk signed by leading researchers.

★★★★☆

Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its family of AI assistants, with a stated mission of responsible development and maintenance of advanced AI for long-term human benefit.

★★★★☆

RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on AI risks, military applications, and regulatory frameworks from one of the leading U.S. defense and policy think tanks.

★★★★☆

Google Scholar profile for Stuart Russell, professor at UC Berkeley and one of the most influential figures in AI safety research. Russell is co-author of the leading AI textbook 'Artificial Intelligence: A Modern Approach' and author of 'Human Compatible,' which argues for a fundamental redesign of AI around human preferences and uncertainty. His research spans AI alignment, inverse reward design, and the long-term risks of advanced AI systems.

★★★★☆
14Unsolved Problems in ML SafetyarXiv·Dan Hendrycks, Nicholas Carlini, John Schulman & Jacob Steinhardt·2021·Paper

This paper presents a comprehensive roadmap for ML safety research, identifying four critical problem areas that the field must address as machine learning systems grow larger and are deployed in high-stakes applications. The authors categorize safety challenges into Robustness (withstanding hazards), Monitoring (identifying hazards), Alignment (reducing inherent model hazards), and Systemic Safety (reducing systemic hazards). By clarifying the motivation behind each problem and providing concrete research directions, the paper aims to guide the ML safety research community toward addressing emerging safety challenges posed by large-scale models.

★★★☆☆

Wikipedia's overview of the Center for AI Safety (CAIS), a nonprofit organization focused on reducing societal-scale risks from advanced AI systems. CAIS is known for publishing the 2023 statement on AI extinction risk signed by hundreds of leading AI researchers and for conducting technical safety research. The article covers the organization's founding, mission, key initiatives, and notable figures involved.

★★★☆☆
16About Us | CAISCenter for AI Safety

The Center for AI Safety (SAFE) is a nonprofit organization focused on reducing societal-scale risks from advanced AI systems. The about page outlines their mission, team, and core research and advocacy activities aimed at ensuring AI development benefits humanity. They work across technical safety research, policy engagement, and public education.

★★★★☆

Structured Data

26 facts·22 recordsView in FactBase →
Revenue
$10.2 million
as of 2024
Total Funding Raised
$33 million
as of 2025
Founded Date
2022

Key People

4
AZ
Andy Zou
Research Scientist
TW
Thomas Woodside
Policy Director
SW
Scott Wiener
Key Legislator (SB 1047 sponsor) · Feb 2024–present
JE
Josué Estrada
Chief Operating Officer

All Facts

26
Organization
PropertyValueAs OfSource
HeadquartersSan Francisco
Founded Date2022
Financial
PropertyValueAs OfSource
Grant Received$1.1 million2025
3 earlier values
2024$2.8 million
2023$5.5 million
2022$5.2 million
Total Funding Raised$33 million2025
Net Assets$11.6 million2024
2 earlier values
2023$8.5 million
2022$5.8 million
Annual Expenses$7.2 million2024
2 earlier values
2023$8.1 million
2022$816,760
Revenue$10.2 million2024
2 earlier values
2023$16.1 million
2022$6.7 million
General
PropertyValueAs OfSource
Websitehttps://www.safe.ai/
Other
PropertyValueAs OfSource
Key PersonDan Hendrycks2025
CompensationDan Hendrycks takes $1 annual salary as Executive Director2025
PublicationThe WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning — benchmark for evaluating dual-use AI capabilities in biosecurity, cybersecurity, and chemical weaponsMar 2024
2 earlier values
Oct 2023Representation Engineering: A Top-Down Approach to AI Transparency — proposes methods to read and control LLM internal representations for safety
Jan 2021Measuring Massive Multitask Language Understanding (MMLU) — widely-used benchmark for evaluating LLM capabilities across 57 academic subjects
InfrastructureCompute cluster with 80+ NVIDIA A100 GPUs available for AI safety researchers2024
ProgramML Safety Scholars — educational program training hundreds of students in AI safety fundamentals. Includes online course, reading groups, and mentorship.2024
Board MemberJaan Tallinn2024
CampaignStatement on AI Risk (May 2023): one-sentence statement 'Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.' Signed by 350+ AI leaders including Geoffery Hinton, Demis Hassabis, Sam Altman, and Dario Amodei.May 2023

Divisions

6
NameDivisionTypeStatusSlugLeadStartDate
Field-Buildingprogram-areaactive
Compute Clusterprogram-areaactive
Researchteamactive
AI and Society Fellowshipprogram-areaactivecais-fellowship
CAIS Compute Clusterlabactivecais-compute
CAIS Action Fundprogram-areaactiveCenter for AI Safety Action FundVarun Krovi2023-07

Publications

12
TitlePublicationTypeAuthorsUrlPublishedDateIsFlagship
Humanity's Last ExampaperLong Phan, Alice Gatti, Ziwen Han, Nathaniel Li et al.arxiv.org2025-01
Introduction to AI Safety, Ethics, and SocietybookDan Hendrycksaisafetybook.com2024-06
The WMDP Benchmark: Measuring and Reducing Malicious Use With UnlearningpaperNathaniel Li, Alexander Pan, Anjali Gopal et al.wmdp.ai2024
Superintelligence StrategyreportDan Hendrycks, Eric Schmidt, Alexandr Wangnationalsecurity.ai2024
Improving Alignment and Robustness with Circuit BreakerspaperAndy Zou, Long Phan, Justin Wang et al.arxiv.org2024
HarmBench: A Standardized Evaluation Framework for Automated Red TeamingpaperMantas Mazeika, Long Phan, Xuwang Yin et al.harmbench.org2024
Representation Engineering: A Top-Down Approach to AI TransparencypaperAndy Zou, Long Phan, Sarah Chen et al.arxiv.org2023-10
An Overview of Catastrophic AI RiskspaperDan Hendrycks, Mantas Mazeika, Thomas Woodsidearxiv.org2023-06
Statement on AI Riskpolicy-briefCAISaistatement.com2023-05
Universal and Transferable Adversarial Attacks on Aligned Language ModelspaperAndy Zou, Zifan Wang, Nicholas Carlini et al.llm-attacks.org2023
Unsolved Problems in ML SafetypaperDan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardtarxiv.org2021-09
Measuring Massive Multitask Language Understanding (MMLU)paperDan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardtarxiv.org2020-09

Related Wiki Pages

Top Related Pages

Approaches

Capability Unlearning / RemovalMAIM (Mutually Assured AI Malfunction)AI AlignmentCorporate AI Safety Responses

Analysis

AI Compute Scaling MetricsAI Safety Intervention Effectiveness MatrixAI Uplift Assessment Model

Policy

Safe and Secure Innovation for Frontier Artificial Intelligence Models Act

Organizations

AnthropicCenter for Human-Compatible AICenter for AI Safety Action FundGoogle DeepMindUS AI Safety InstituteRedwood Research

Other

Geoffrey HintonStuart Russell

Concepts

AGI Timeline

Key Debates

Is AI Existential Risk Real?

Risks

AI-Induced Irreversibility

Historical

The MIRI Era