Longterm Wiki

Center for AI Safety

cais (E47)
← Back to pagePath: /knowledge-base/organizations/cais/
Page Metadata
{
  "id": "cais",
  "numericId": null,
  "path": "/knowledge-base/organizations/cais/",
  "filePath": "knowledge-base/organizations/cais.mdx",
  "title": "CAIS (Center for AI Safety)",
  "quality": 42,
  "importance": 42,
  "contentFormat": "article",
  "tractability": null,
  "neglectedness": null,
  "uncertainty": null,
  "causalLevel": null,
  "lastUpdated": "2025-12-24",
  "llmSummary": "CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.",
  "structuredSummary": null,
  "description": "Research organization advancing AI safety through technical research, field-building, and policy communication, including the landmark 2023 AI extinction risk statement signed by major AI leaders",
  "ratings": {
    "novelty": 2.5,
    "rigor": 4,
    "actionability": 3.5,
    "completeness": 5.5
  },
  "category": "organizations",
  "subcategory": "safety-orgs",
  "clusters": [
    "community",
    "ai-safety",
    "governance"
  ],
  "metrics": {
    "wordCount": 824,
    "tableCount": 7,
    "diagramCount": 0,
    "internalLinks": 42,
    "externalLinks": 0,
    "footnoteCount": 0,
    "bulletRatio": 0.2,
    "sectionCount": 21,
    "hasOverview": true,
    "structuralScore": 10
  },
  "suggestedQuality": 67,
  "updateFrequency": 21,
  "evergreen": true,
  "wordCount": 824,
  "unconvertedLinks": [],
  "unconvertedLinkCount": 0,
  "convertedLinkCount": 20,
  "backlinkCount": 6,
  "redundancy": {
    "maxSimilarity": 15,
    "similarPages": [
      {
        "id": "safety-research-value",
        "title": "Expected Value of AI Safety Research",
        "path": "/knowledge-base/models/safety-research-value/",
        "similarity": 15
      },
      {
        "id": "dan-hendrycks",
        "title": "Dan Hendrycks",
        "path": "/knowledge-base/people/dan-hendrycks/",
        "similarity": 15
      },
      {
        "id": "chai",
        "title": "CHAI (Center for Human-Compatible AI)",
        "path": "/knowledge-base/organizations/chai/",
        "similarity": 13
      },
      {
        "id": "far-ai",
        "title": "FAR AI",
        "path": "/knowledge-base/organizations/far-ai/",
        "similarity": 13
      },
      {
        "id": "yoshua-bengio",
        "title": "Yoshua Bengio",
        "path": "/knowledge-base/people/yoshua-bengio/",
        "similarity": 13
      }
    ]
  }
}
Entity Data
{
  "id": "cais",
  "type": "organization",
  "title": "Center for AI Safety",
  "description": "The Center for AI Safety (CAIS) is a nonprofit organization that works to reduce societal-scale risks from AI. CAIS combines research, field-building, and public communication to advance AI safety.",
  "tags": [
    "ai-safety",
    "x-risk",
    "representation-engineering",
    "field-building",
    "ai-risk-communication"
  ],
  "relatedEntries": [
    {
      "id": "existential-risk",
      "type": "risk"
    },
    {
      "id": "power-seeking",
      "type": "risk"
    },
    {
      "id": "anthropic",
      "type": "lab"
    }
  ],
  "sources": [
    {
      "title": "CAIS Website",
      "url": "https://safe.ai"
    },
    {
      "title": "Statement on AI Risk",
      "url": "https://www.safe.ai/statement-on-ai-risk"
    },
    {
      "title": "Representation Engineering Paper",
      "url": "https://arxiv.org/abs/2310.01405"
    }
  ],
  "lastUpdated": "2025-12",
  "website": "https://safe.ai",
  "customFields": []
}
Canonical Facts (0)

No facts for this entity

External Links
{
  "eaForum": "https://forum.effectivealtruism.org/topics/center-for-ai-safety",
  "wikidata": "https://www.wikidata.org/wiki/Q119084607"
}
Backlinks (6)
idtitletyperelationship
dan-hendrycksDan Hendrycksresearcher
capability-unlearningCapability Unlearning / Removalapproach
pausePause Advocacyapproach
maimMAIM (Mutually Assured AI Malfunction)policy
representation-engineeringRepresentation Engineeringapproach
power-seekingPower-Seeking AIrisk
Frontmatter
{
  "title": "CAIS (Center for AI Safety)",
  "description": "Research organization advancing AI safety through technical research, field-building, and policy communication, including the landmark 2023 AI extinction risk statement signed by major AI leaders",
  "sidebar": {
    "order": 14
  },
  "quality": 42,
  "llmSummary": "CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.",
  "lastEdited": "2025-12-24",
  "importance": 42,
  "update_frequency": 21,
  "ratings": {
    "novelty": 2.5,
    "rigor": 4,
    "actionability": 3.5,
    "completeness": 5.5
  },
  "clusters": [
    "community",
    "ai-safety",
    "governance"
  ],
  "subcategory": "safety-orgs",
  "entityType": "organization"
}
Raw MDX Source
---
title: CAIS (Center for AI Safety)
description: Research organization advancing AI safety through technical research, field-building, and policy communication, including the landmark 2023 AI extinction risk statement signed by major AI leaders
sidebar:
  order: 14
quality: 42
llmSummary: CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.
lastEdited: "2025-12-24"
importance: 42
update_frequency: 21
ratings:
  novelty: 2.5
  rigor: 4
  actionability: 3.5
  completeness: 5.5
clusters:
  - community
  - ai-safety
  - governance
subcategory: safety-orgs
entityType: organization
---
import {DataInfoBox, KeyPeople, Section, R, EntityLink, DataExternalLinks} from '@components/wiki';

<DataExternalLinks pageId="cais" />

<DataInfoBox entityId="E47" />

## Overview

The <R id="a306e0b63bdedbd5">Center for AI Safety (CAIS)</R> is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication efforts. Founded by <EntityLink id="E89">Dan Hendrycks</EntityLink>, CAIS gained widespread recognition for organizing the landmark "Statement on AI Risk" in May 2023, which received signatures from over 350 AI researchers and industry leaders.

CAIS's multi-pronged approach combines cutting-edge technical research on AI alignment and robustness with strategic field-building efforts that have supported over 200 researchers through grants and fellowships. The organization's work spans from fundamental research on <R id="22d74e88304202f8"><EntityLink id="E479">representation engineering</EntityLink></R> to developing critical safety benchmarks like the <R id="6d4e8851e33e1641">MACHIAVELLI dataset</R> for evaluating deceptive AI behavior.

## Risk Assessment

| Risk Category | Assessment | Evidence | Mitigation Focus |
|---------------|------------|----------|------------------|
| Technical Research Impact | High | 50+ safety publications, novel benchmarks | <R id="22d74e88304202f8">Representation engineering</R>, adversarial robustness |
| Field-Building Influence | Very High | 200+ researchers supported, \$1M+ distributed | Compute grants, fellowship programs |
| Policy Communication | High | Statement signed by major AI leaders | Public awareness, expert consensus building |
| Timeline Relevance | Medium-High | Research targets near-term safety challenges | 2-5 year research horizon |

## Key Research Areas

### Technical Safety Research

| Research Domain | Key Contributions | Impact Metrics |
|-----------------|------------------|----------------|
| **Representation Engineering** | Methods for reading/steering model internals | <R id="da9dc068f95f855d">15+ citations</R> within 6 months |
| **Safety Benchmarks** | MACHIAVELLI, power-seeking evaluations | Adopted by <R id="afe2508ac4caf5ee">Anthropic</R>, <R id="04d39e8bd5d50dd5">OpenAI</R> |
| **Adversarial Robustness** | Novel defense mechanisms, evaluation protocols | 100+ citations on key papers |
| **Alignment Foundations** | Conceptual frameworks for AI safety | Influenced <EntityLink id="E439">alignment research</EntityLink> directions |

### Major Publications & Tools

- **<R id="5d708a72c3af8ad9">Representation Engineering: A Top-Down Approach to AI Transparency</R>** (2023) - Methods for understanding AI decision-making
- **<R id="6d4e8851e33e1641">Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior</R>** (2023) - MACHIAVELLI benchmark for ethical <EntityLink id="E447">AI evaluation</EntityLink>  
- **<R id="f94e705023d45765">Unsolved Problems in ML Safety</R>** (2022) - Comprehensive taxonomy of safety challenges
- **<R id="985b203c41c31efe">Measuring Mathematical Problem Solving With the MATH Dataset</R>** (2021) - Standard benchmark for AI reasoning capabilities

## Field-Building Impact

### Grant Programs

| Program | Scale | Impact | Timeline |
|---------|--------|--------|----------|
| **Compute Grants** | \$2M+ distributed | 100+ researchers supported | 2022-present |
| **ML Safety Scholars** | 50+ participants annually | Early-career pipeline development | 2021-present |
| **Research Fellowships** | \$500K+ annually | 20+ fellows placed at top institutions | 2022-present |
| **AI Safety Camp** | 200+ participants total | International collaboration network | 2020-present |

### Institutional Partnerships

- **Academic Collaborations**: UC Berkeley, MIT, Stanford, Oxford
- **Industry Engagement**: Research partnerships with <EntityLink id="E22">Anthropic</EntityLink>, Google DeepMind
- **Policy Connections**: Briefings for US Congress, UK Parliament, EU regulators

## Statement on AI Risk (2023)

The May 2023 <R id="470ac236ca26008c">Statement on AI Risk</R> represented a watershed moment in AI safety advocacy, consisting of a single sentence:

> "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

### Signatory Analysis

| Category | Notable Signatories | Significance |
|----------|-------------------|--------------|
| **Turing Award Winners** | <EntityLink id="E149">Geoffrey Hinton</EntityLink>, <EntityLink id="E380">Yoshua Bengio</EntityLink>, <EntityLink id="E290">Stuart Russell</EntityLink> | Academic legitimacy |
| **Industry Leaders** | <EntityLink id="E269">Sam Altman</EntityLink> (OpenAI), <EntityLink id="E91">Dario Amodei</EntityLink> (Anthropic), <EntityLink id="E101">Demis Hassabis</EntityLink> (DeepMind) | Industry acknowledgment |
| **Policy Experts** | <EntityLink id="E575">Helen Toner</EntityLink>, Allan Dafoe, Gillian Hadfield | Governance credibility |
| **Technical Researchers** | 300+ ML/AI researchers | Scientific consensus |

The statement's impact included immediate media coverage across major outlets and influenced subsequent policy discussions, including mentions in <EntityLink id="E364">UK</EntityLink> and <EntityLink id="E365">US</EntityLink> government AI strategies.

## Current Trajectory & Timeline

### Research Roadmap (2024-2026)

| Priority Area | 2024 Goals | 2025-2026 Projections |
|---------------|------------|----------------------|
| **Representation Engineering** | Scale to frontier models | Industry adoption for safety checks |
| **Evaluation Frameworks** | Comprehensive benchmark suite | Standard evaluation protocols |
| **Alignment Methods** | Proof-of-concept demonstrations | Practical implementation |
| **Policy Research** | Technical governance recommendations | Regulatory framework development |

### Funding & Growth

- **Current Budget**: ≈\$5M annually (estimated)
- **Researcher Count**: 15+ full-time staff, 50+ affiliates
- **Projected Growth**: 2x expansion by 2025 based on field growth

## Key Uncertainties & Research Cruxes

### Technical Challenges

- **Representation Engineering Scalability**: Whether current methods work on frontier models remains unclear
- **Benchmark Validity**: Unknown if current evaluations capture real safety risks
- **Alignment Verification**: No consensus on how to verify successful alignment

### Strategic Questions

- **Research vs. Policy Balance**: Optimal allocation between technical work and governance efforts
- **Open vs. Closed Research**: Tension between transparency and information hazards
- **Timeline Assumptions**: Disagreement on <EntityLink id="E399">AGI timelines</EntityLink> affects research priorities

## Leadership & Key Personnel

<Section title="Key People">
  <KeyPeople people={[
    { name: "Dan Hendrycks", role: "Executive Director", background: "UC Berkeley PhD, former Google Brain" },
    { name: "Mantas Mazeika", role: "Research Director", background: "University of Chicago, adversarial ML expert" },
    { name: "Thomas Woodside", role: "Policy Director", background: "Former congressional staffer, tech policy" },
    { name: "Andy Zou", role: "Research Scientist", background: "CMU, jailbreaking and red-teaming research" },
  ]} />
</Section>

## Sources & Resources

### Official Resources

| Type | Resource | Description |
|------|----------|-------------|
| **Website** | <R id="a306e0b63bdedbd5">safe.ai</R> | Main organization hub |
| **Research** | <R id="51721cfcac0c036a">CAIS Publications</R> | Technical papers and reports |
| **Blog** | <R id="a27b8d271c27aa02">CAIS Blog</R> | Research updates and commentary |
| **Courses** | <R id="65c9fe2d57a4eb4c">ML Safety Course</R> | Educational materials |

### Key Research Papers

| Paper | Year | Citations | Impact |
|-------|------|-----------|--------|
| <R id="f94e705023d45765">Unsolved Problems in ML Safety</R> | 2022 | 200+ | Research agenda setting |
| <R id="6d4e8851e33e1641">MACHIAVELLI Benchmark</R> | 2023 | 50+ | Industry evaluation adoption |
| <R id="5d708a72c3af8ad9">Representation Engineering</R> | 2023 | 30+ | New research direction |

### Related Organizations

- **Research Alignment**: <EntityLink id="E202">MIRI</EntityLink>, <EntityLink id="E57">CHAI</EntityLink>, <EntityLink id="E557">Redwood Research</EntityLink>
- **Policy Focus**: <EntityLink id="E153">GovAI</EntityLink>, <R id="cf5fd74e8db11565">RAND Corporation</R>
- **Industry Labs**: <EntityLink id="E22">Anthropic</EntityLink>, <EntityLink id="E218">OpenAI</EntityLink>, <EntityLink id="E98">DeepMind</EntityLink>