Center for AI Safety
cais (E47)← Back to pagePath: /knowledge-base/organizations/cais/
Page Metadata
{
"id": "cais",
"numericId": null,
"path": "/knowledge-base/organizations/cais/",
"filePath": "knowledge-base/organizations/cais.mdx",
"title": "CAIS (Center for AI Safety)",
"quality": 42,
"importance": 42,
"contentFormat": "article",
"tractability": null,
"neglectedness": null,
"uncertainty": null,
"causalLevel": null,
"lastUpdated": "2025-12-24",
"llmSummary": "CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.",
"structuredSummary": null,
"description": "Research organization advancing AI safety through technical research, field-building, and policy communication, including the landmark 2023 AI extinction risk statement signed by major AI leaders",
"ratings": {
"novelty": 2.5,
"rigor": 4,
"actionability": 3.5,
"completeness": 5.5
},
"category": "organizations",
"subcategory": "safety-orgs",
"clusters": [
"community",
"ai-safety",
"governance"
],
"metrics": {
"wordCount": 824,
"tableCount": 7,
"diagramCount": 0,
"internalLinks": 42,
"externalLinks": 0,
"footnoteCount": 0,
"bulletRatio": 0.2,
"sectionCount": 21,
"hasOverview": true,
"structuralScore": 10
},
"suggestedQuality": 67,
"updateFrequency": 21,
"evergreen": true,
"wordCount": 824,
"unconvertedLinks": [],
"unconvertedLinkCount": 0,
"convertedLinkCount": 20,
"backlinkCount": 6,
"redundancy": {
"maxSimilarity": 15,
"similarPages": [
{
"id": "safety-research-value",
"title": "Expected Value of AI Safety Research",
"path": "/knowledge-base/models/safety-research-value/",
"similarity": 15
},
{
"id": "dan-hendrycks",
"title": "Dan Hendrycks",
"path": "/knowledge-base/people/dan-hendrycks/",
"similarity": 15
},
{
"id": "chai",
"title": "CHAI (Center for Human-Compatible AI)",
"path": "/knowledge-base/organizations/chai/",
"similarity": 13
},
{
"id": "far-ai",
"title": "FAR AI",
"path": "/knowledge-base/organizations/far-ai/",
"similarity": 13
},
{
"id": "yoshua-bengio",
"title": "Yoshua Bengio",
"path": "/knowledge-base/people/yoshua-bengio/",
"similarity": 13
}
]
}
}Entity Data
{
"id": "cais",
"type": "organization",
"title": "Center for AI Safety",
"description": "The Center for AI Safety (CAIS) is a nonprofit organization that works to reduce societal-scale risks from AI. CAIS combines research, field-building, and public communication to advance AI safety.",
"tags": [
"ai-safety",
"x-risk",
"representation-engineering",
"field-building",
"ai-risk-communication"
],
"relatedEntries": [
{
"id": "existential-risk",
"type": "risk"
},
{
"id": "power-seeking",
"type": "risk"
},
{
"id": "anthropic",
"type": "lab"
}
],
"sources": [
{
"title": "CAIS Website",
"url": "https://safe.ai"
},
{
"title": "Statement on AI Risk",
"url": "https://www.safe.ai/statement-on-ai-risk"
},
{
"title": "Representation Engineering Paper",
"url": "https://arxiv.org/abs/2310.01405"
}
],
"lastUpdated": "2025-12",
"website": "https://safe.ai",
"customFields": []
}Canonical Facts (0)
No facts for this entity
External Links
{
"eaForum": "https://forum.effectivealtruism.org/topics/center-for-ai-safety",
"wikidata": "https://www.wikidata.org/wiki/Q119084607"
}Backlinks (6)
| id | title | type | relationship |
|---|---|---|---|
| dan-hendrycks | Dan Hendrycks | researcher | — |
| capability-unlearning | Capability Unlearning / Removal | approach | — |
| pause | Pause Advocacy | approach | — |
| maim | MAIM (Mutually Assured AI Malfunction) | policy | — |
| representation-engineering | Representation Engineering | approach | — |
| power-seeking | Power-Seeking AI | risk | — |
Frontmatter
{
"title": "CAIS (Center for AI Safety)",
"description": "Research organization advancing AI safety through technical research, field-building, and policy communication, including the landmark 2023 AI extinction risk statement signed by major AI leaders",
"sidebar": {
"order": 14
},
"quality": 42,
"llmSummary": "CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.",
"lastEdited": "2025-12-24",
"importance": 42,
"update_frequency": 21,
"ratings": {
"novelty": 2.5,
"rigor": 4,
"actionability": 3.5,
"completeness": 5.5
},
"clusters": [
"community",
"ai-safety",
"governance"
],
"subcategory": "safety-orgs",
"entityType": "organization"
}Raw MDX Source
---
title: CAIS (Center for AI Safety)
description: Research organization advancing AI safety through technical research, field-building, and policy communication, including the landmark 2023 AI extinction risk statement signed by major AI leaders
sidebar:
order: 14
quality: 42
llmSummary: CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.
lastEdited: "2025-12-24"
importance: 42
update_frequency: 21
ratings:
novelty: 2.5
rigor: 4
actionability: 3.5
completeness: 5.5
clusters:
- community
- ai-safety
- governance
subcategory: safety-orgs
entityType: organization
---
import {DataInfoBox, KeyPeople, Section, R, EntityLink, DataExternalLinks} from '@components/wiki';
<DataExternalLinks pageId="cais" />
<DataInfoBox entityId="E47" />
## Overview
The <R id="a306e0b63bdedbd5">Center for AI Safety (CAIS)</R> is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication efforts. Founded by <EntityLink id="E89">Dan Hendrycks</EntityLink>, CAIS gained widespread recognition for organizing the landmark "Statement on AI Risk" in May 2023, which received signatures from over 350 AI researchers and industry leaders.
CAIS's multi-pronged approach combines cutting-edge technical research on AI alignment and robustness with strategic field-building efforts that have supported over 200 researchers through grants and fellowships. The organization's work spans from fundamental research on <R id="22d74e88304202f8"><EntityLink id="E479">representation engineering</EntityLink></R> to developing critical safety benchmarks like the <R id="6d4e8851e33e1641">MACHIAVELLI dataset</R> for evaluating deceptive AI behavior.
## Risk Assessment
| Risk Category | Assessment | Evidence | Mitigation Focus |
|---------------|------------|----------|------------------|
| Technical Research Impact | High | 50+ safety publications, novel benchmarks | <R id="22d74e88304202f8">Representation engineering</R>, adversarial robustness |
| Field-Building Influence | Very High | 200+ researchers supported, \$1M+ distributed | Compute grants, fellowship programs |
| Policy Communication | High | Statement signed by major AI leaders | Public awareness, expert consensus building |
| Timeline Relevance | Medium-High | Research targets near-term safety challenges | 2-5 year research horizon |
## Key Research Areas
### Technical Safety Research
| Research Domain | Key Contributions | Impact Metrics |
|-----------------|------------------|----------------|
| **Representation Engineering** | Methods for reading/steering model internals | <R id="da9dc068f95f855d">15+ citations</R> within 6 months |
| **Safety Benchmarks** | MACHIAVELLI, power-seeking evaluations | Adopted by <R id="afe2508ac4caf5ee">Anthropic</R>, <R id="04d39e8bd5d50dd5">OpenAI</R> |
| **Adversarial Robustness** | Novel defense mechanisms, evaluation protocols | 100+ citations on key papers |
| **Alignment Foundations** | Conceptual frameworks for AI safety | Influenced <EntityLink id="E439">alignment research</EntityLink> directions |
### Major Publications & Tools
- **<R id="5d708a72c3af8ad9">Representation Engineering: A Top-Down Approach to AI Transparency</R>** (2023) - Methods for understanding AI decision-making
- **<R id="6d4e8851e33e1641">Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior</R>** (2023) - MACHIAVELLI benchmark for ethical <EntityLink id="E447">AI evaluation</EntityLink>
- **<R id="f94e705023d45765">Unsolved Problems in ML Safety</R>** (2022) - Comprehensive taxonomy of safety challenges
- **<R id="985b203c41c31efe">Measuring Mathematical Problem Solving With the MATH Dataset</R>** (2021) - Standard benchmark for AI reasoning capabilities
## Field-Building Impact
### Grant Programs
| Program | Scale | Impact | Timeline |
|---------|--------|--------|----------|
| **Compute Grants** | \$2M+ distributed | 100+ researchers supported | 2022-present |
| **ML Safety Scholars** | 50+ participants annually | Early-career pipeline development | 2021-present |
| **Research Fellowships** | \$500K+ annually | 20+ fellows placed at top institutions | 2022-present |
| **AI Safety Camp** | 200+ participants total | International collaboration network | 2020-present |
### Institutional Partnerships
- **Academic Collaborations**: UC Berkeley, MIT, Stanford, Oxford
- **Industry Engagement**: Research partnerships with <EntityLink id="E22">Anthropic</EntityLink>, Google DeepMind
- **Policy Connections**: Briefings for US Congress, UK Parliament, EU regulators
## Statement on AI Risk (2023)
The May 2023 <R id="470ac236ca26008c">Statement on AI Risk</R> represented a watershed moment in AI safety advocacy, consisting of a single sentence:
> "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."
### Signatory Analysis
| Category | Notable Signatories | Significance |
|----------|-------------------|--------------|
| **Turing Award Winners** | <EntityLink id="E149">Geoffrey Hinton</EntityLink>, <EntityLink id="E380">Yoshua Bengio</EntityLink>, <EntityLink id="E290">Stuart Russell</EntityLink> | Academic legitimacy |
| **Industry Leaders** | <EntityLink id="E269">Sam Altman</EntityLink> (OpenAI), <EntityLink id="E91">Dario Amodei</EntityLink> (Anthropic), <EntityLink id="E101">Demis Hassabis</EntityLink> (DeepMind) | Industry acknowledgment |
| **Policy Experts** | <EntityLink id="E575">Helen Toner</EntityLink>, Allan Dafoe, Gillian Hadfield | Governance credibility |
| **Technical Researchers** | 300+ ML/AI researchers | Scientific consensus |
The statement's impact included immediate media coverage across major outlets and influenced subsequent policy discussions, including mentions in <EntityLink id="E364">UK</EntityLink> and <EntityLink id="E365">US</EntityLink> government AI strategies.
## Current Trajectory & Timeline
### Research Roadmap (2024-2026)
| Priority Area | 2024 Goals | 2025-2026 Projections |
|---------------|------------|----------------------|
| **Representation Engineering** | Scale to frontier models | Industry adoption for safety checks |
| **Evaluation Frameworks** | Comprehensive benchmark suite | Standard evaluation protocols |
| **Alignment Methods** | Proof-of-concept demonstrations | Practical implementation |
| **Policy Research** | Technical governance recommendations | Regulatory framework development |
### Funding & Growth
- **Current Budget**: ≈\$5M annually (estimated)
- **Researcher Count**: 15+ full-time staff, 50+ affiliates
- **Projected Growth**: 2x expansion by 2025 based on field growth
## Key Uncertainties & Research Cruxes
### Technical Challenges
- **Representation Engineering Scalability**: Whether current methods work on frontier models remains unclear
- **Benchmark Validity**: Unknown if current evaluations capture real safety risks
- **Alignment Verification**: No consensus on how to verify successful alignment
### Strategic Questions
- **Research vs. Policy Balance**: Optimal allocation between technical work and governance efforts
- **Open vs. Closed Research**: Tension between transparency and information hazards
- **Timeline Assumptions**: Disagreement on <EntityLink id="E399">AGI timelines</EntityLink> affects research priorities
## Leadership & Key Personnel
<Section title="Key People">
<KeyPeople people={[
{ name: "Dan Hendrycks", role: "Executive Director", background: "UC Berkeley PhD, former Google Brain" },
{ name: "Mantas Mazeika", role: "Research Director", background: "University of Chicago, adversarial ML expert" },
{ name: "Thomas Woodside", role: "Policy Director", background: "Former congressional staffer, tech policy" },
{ name: "Andy Zou", role: "Research Scientist", background: "CMU, jailbreaking and red-teaming research" },
]} />
</Section>
## Sources & Resources
### Official Resources
| Type | Resource | Description |
|------|----------|-------------|
| **Website** | <R id="a306e0b63bdedbd5">safe.ai</R> | Main organization hub |
| **Research** | <R id="51721cfcac0c036a">CAIS Publications</R> | Technical papers and reports |
| **Blog** | <R id="a27b8d271c27aa02">CAIS Blog</R> | Research updates and commentary |
| **Courses** | <R id="65c9fe2d57a4eb4c">ML Safety Course</R> | Educational materials |
### Key Research Papers
| Paper | Year | Citations | Impact |
|-------|------|-----------|--------|
| <R id="f94e705023d45765">Unsolved Problems in ML Safety</R> | 2022 | 200+ | Research agenda setting |
| <R id="6d4e8851e33e1641">MACHIAVELLI Benchmark</R> | 2023 | 50+ | Industry evaluation adoption |
| <R id="5d708a72c3af8ad9">Representation Engineering</R> | 2023 | 30+ | New research direction |
### Related Organizations
- **Research Alignment**: <EntityLink id="E202">MIRI</EntityLink>, <EntityLink id="E57">CHAI</EntityLink>, <EntityLink id="E557">Redwood Research</EntityLink>
- **Policy Focus**: <EntityLink id="E153">GovAI</EntityLink>, <R id="cf5fd74e8db11565">RAND Corporation</R>
- **Industry Labs**: <EntityLink id="E22">Anthropic</EntityLink>, <EntityLink id="E218">OpenAI</EntityLink>, <EntityLink id="E98">DeepMind</EntityLink>