Page StatusContent

Edited 7 weeks ago824 words6 backlinks

Updated every 3 weeksOverdue by 30 days

Summary

CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.

Issues1

QualityRated 42 but structure suggests 67 (underrated by 25 points)

CAIS (Center for AI Safety)

Safety Org

Center for AI Safety

EA Forum Wikidata

TypeSafety Org

Founded2022

LocationSan Francisco, CA

Websitesafe.ai

Risks

Organizations

824 words · 6 backlinks

Safety Org

Center for AI Safety

EA Forum Wikidata

TypeSafety Org

Founded2022

LocationSan Francisco, CA

Websitesafe.ai

Risks

Organizations

824 words · 6 backlinks

Overview

The Center for AI Safety (CAIS)↗ is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication efforts. Founded by Dan Hendrycks, CAIS gained widespread recognition for organizing the landmark "Statement on AI Risk" in May 2023, which received signatures from over 350 AI researchers and industry leaders.

CAIS's multi-pronged approach combines cutting-edge technical research on AI alignment and robustness with strategic field-building efforts that have supported over 200 researchers through grants and fellowships. The organization's work spans from fundamental research on representation engineering↗ to developing critical safety benchmarks like the MACHIAVELLI dataset↗ for evaluating deceptive AI behavior.

Risk Assessment

Risk Category	Assessment	Evidence	Mitigation Focus
Technical Research Impact	High	50+ safety publications, novel benchmarks	Representation engineering↗, adversarial robustness
Field-Building Influence	Very High	200+ researchers supported, $1M+ distributed	Compute grants, fellowship programs
Policy Communication	High	Statement signed by major AI leaders	Public awareness, expert consensus building
Timeline Relevance	Medium-High	Research targets near-term safety challenges	2-5 year research horizon

Key Research Areas

Technical Safety Research

Research Domain	Key Contributions	Impact Metrics
Representation Engineering	Methods for reading/steering model internals	15+ citations↗ within 6 months
Safety Benchmarks	MACHIAVELLI, power-seeking evaluations	Adopted by Anthropic↗, OpenAI↗
Adversarial Robustness	Novel defense mechanisms, evaluation protocols	100+ citations on key papers
Alignment Foundations	Conceptual frameworks for AI safety	Influenced alignment research directions

Major Publications & Tools

Representation Engineering: A Top-Down Approach to AI Transparency↗ (2023) - Methods for understanding AI decision-making
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior↗ (2023) - MACHIAVELLI benchmark for ethical AI evaluation
Unsolved Problems in ML Safety↗ (2022) - Comprehensive taxonomy of safety challenges
Measuring Mathematical Problem Solving With the MATH Dataset↗ (2021) - Standard benchmark for AI reasoning capabilities

Field-Building Impact

Grant Programs

Program	Scale	Impact	Timeline
Compute Grants	$2M+ distributed	100+ researchers supported	2022-present
ML Safety Scholars	50+ participants annually	Early-career pipeline development	2021-present
Research Fellowships	$500K+ annually	20+ fellows placed at top institutions	2022-present
AI Safety Camp	200+ participants total	International collaboration network	2020-present

Institutional Partnerships

Academic Collaborations: UC Berkeley, MIT, Stanford, Oxford
Industry Engagement: Research partnerships with Anthropic, Google DeepMind
Policy Connections: Briefings for US Congress, UK Parliament, EU regulators

Statement on AI Risk (2023)

The May 2023 Statement on AI Risk↗ represented a watershed moment in AI safety advocacy, consisting of a single sentence:

"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

Signatory Analysis

Category	Notable Signatories	Significance
Turing Award Winners	Geoffrey Hinton, Yoshua Bengio, Stuart Russell	Academic legitimacy
Industry Leaders	Sam Altman (OpenAI), Dario Amodei (Anthropic), Demis Hassabis (DeepMind)	Industry acknowledgment
Policy Experts	Helen Toner, Allan Dafoe, Gillian Hadfield	Governance credibility
Technical Researchers	300+ ML/AI researchers	Scientific consensus

The statement's impact included immediate media coverage across major outlets and influenced subsequent policy discussions, including mentions in UK and US government AI strategies.

Current Trajectory & Timeline

Research Roadmap (2024-2026)

Priority Area	2024 Goals	2025-2026 Projections
Representation Engineering	Scale to frontier models	Industry adoption for safety checks
Evaluation Frameworks	Comprehensive benchmark suite	Standard evaluation protocols
Alignment Methods	Proof-of-concept demonstrations	Practical implementation
Policy Research	Technical governance recommendations	Regulatory framework development

Funding & Growth

Current Budget: ≈$5M annually (estimated)
Researcher Count: 15+ full-time staff, 50+ affiliates
Projected Growth: 2x expansion by 2025 based on field growth

Key Uncertainties & Research Cruxes

Technical Challenges

Representation Engineering Scalability: Whether current methods work on frontier models remains unclear
Benchmark Validity: Unknown if current evaluations capture real safety risks
Alignment Verification: No consensus on how to verify successful alignment

Strategic Questions

Research vs. Policy Balance: Optimal allocation between technical work and governance efforts
Open vs. Closed Research: Tension between transparency and information hazards
Timeline Assumptions: Disagreement on AGI timelines affects research priorities

Leadership & Key Personnel

Key People

Dan Hendrycks

Executive Director

UC Berkeley PhD, former Google Brain

Mantas Mazeika

Research Director

University of Chicago, adversarial ML expert

Thomas Woodside

Policy Director

Former congressional staffer, tech policy

Andy Zou

Research Scientist

CMU, jailbreaking and red-teaming research

Sources & Resources

Official Resources

Type	Resource	Description
Website	safe.ai↗	Main organization hub
Research	CAIS Publications↗	Technical papers and reports
Blog	CAIS Blog↗	Research updates and commentary
Courses	ML Safety Course↗	Educational materials

Key Research Papers

Paper	Year	Citations	Impact
Unsolved Problems in ML Safety↗	2022	200+	Research agenda setting
MACHIAVELLI Benchmark↗	2023	50+	Industry evaluation adoption
Representation Engineering↗	2023	30+	New research direction

Related Organizations

Research Alignment: MIRI, CHAI, Redwood Research
Policy Focus: GovAI, RAND Corporation↗
Industry Labs: Anthropic, OpenAI, DeepMind

CAIS (Center for AI Safety)

Center for AI Safety

Center for AI Safety

Overview

Risk Assessment

Key Research Areas

Technical Safety Research

Major Publications & Tools

Field-Building Impact

Grant Programs

Institutional Partnerships

Statement on AI Risk (2023)

Signatory Analysis

Current Trajectory & Timeline

Research Roadmap (2024-2026)

Funding & Growth

Key Uncertainties & Research Cruxes

Technical Challenges

Strategic Questions

Leadership & Key Personnel

Key People

Sources & Resources

Official Resources

Key Research Papers

Related Organizations

Related Pages

Top Related Pages

Power-Seeking AI

Pause Advocacy

Representation Engineering

Capability Unlearning / Removal

MAIM (Mutually Assured AI Malfunction)

People

Labs

Safety Research

Concepts

Risks

Models

Organizations

Key Debates

Transition Model