CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.
CAIS (Center for AI Safety)
Center for AI Safety
CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.
Center for AI Safety
CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.
Overview
The Center for AI Safety (CAIS)โ๐ webโ โ โ โ โCenter for AI SafetyCAIS SurveysThe Center for AI Safety conducts technical and conceptual research to mitigate potential catastrophic risks from advanced AI systems. They take a comprehensive approach spannin...safetyx-risktalentfield-building+1Source โ is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication efforts. Founded by Dan HendrycksPersonDan HendrycksBiographical overview of Dan Hendrycks, CAIS director who coordinated the May 2023 AI risk statement signed by major AI researchers. Covers his technical work on benchmarks (MMLU, ETHICS), robustne...Quality: 19/100, CAIS gained widespread recognition for organizing the landmark "Statement on AI Risk" in May 2023, which received signatures from over 350 AI researchers and industry leaders.
CAIS's multi-pronged approach combines cutting-edge technical research on AI alignment and robustness with strategic field-building efforts that have supported over 200 researchers through grants and fellowships. The organization's work spans from fundamental research on representation engineeringApproachRepresentation EngineeringRepresentation engineering enables behavior steering and deception detection by manipulating concept-level vectors in neural networks, achieving 80-95% success in controlled experiments for honesty...Quality: 72/100โ๐ webโ โ โ โ โCenter for AI Safetyrepresentation engineeringai-safetyx-riskrepresentation-engineeringSource โ to developing critical safety benchmarks like the MACHIAVELLI datasetโ๐ paperโ โ โ โโarXivMACHIAVELLI datasetAlexander Pan, Jun Shern Chan, Andy Zou et al. (2023)capabilitiessafetydeceptionevaluation+1Source โ for evaluating deceptive AI behavior.
Risk Assessment
| Risk Category | Assessment | Evidence | Mitigation Focus |
|---|---|---|---|
| Technical Research Impact | High | 50+ safety publications, novel benchmarks | Representation engineeringโ๐ webโ โ โ โ โCenter for AI Safetyrepresentation engineeringai-safetyx-riskrepresentation-engineeringSource โ, adversarial robustness |
| Field-Building Influence | Very High | 200+ researchers supported, $1M+ distributed | Compute grants, fellowship programs |
| Policy Communication | High | Statement signed by major AI leaders | Public awareness, expert consensus building |
| Timeline Relevance | Medium-High | Research targets near-term safety challenges | 2-5 year research horizon |
Key Research Areas
Technical Safety Research
| Research Domain | Key Contributions | Impact Metrics |
|---|---|---|
| Representation Engineering | Methods for reading/steering model internals | 15+ citationsโ๐ webโ โ โ โ โGoogle Scholar15+ citationsai-safetyx-riskrepresentation-engineeringSource โ within 6 months |
| Safety Benchmarks | MACHIAVELLI, power-seeking evaluations | Adopted by Anthropicโ๐ webโ โ โ โ โAnthropicAnthropicfoundation-modelstransformersscalingescalation+1Source โ, OpenAIโ๐ webโ โ โ โ โOpenAIOpenAIfoundation-modelstransformersscalingtalent+1Source โ |
| Adversarial Robustness | Novel defense mechanisms, evaluation protocols | 100+ citations on key papers |
| Alignment Foundations | Conceptual frameworks for AI safety | Influenced alignment researchApproachAI AlignmentComprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) achieve 75-90% effectiveness on existing systems but face critical scalability challenges, with ove...Quality: 91/100 directions |
Major Publications & Tools
- Representation Engineering: A Top-Down Approach to AI Transparencyโ๐ paperโ โ โ โโarXivRepresentation Engineering: A Top-Down Approach to AI TransparencyAndy Zou, Long Phan, Sarah Chen et al. (2023)interpretabilitysafetyllmai-safety+1Source โ (2023) - Methods for understanding AI decision-making
- Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behaviorโ๐ paperโ โ โ โโarXivMACHIAVELLI datasetAlexander Pan, Jun Shern Chan, Andy Zou et al. (2023)capabilitiessafetydeceptionevaluation+1Source โ (2023) - MACHIAVELLI benchmark for ethical AI evaluationApproachAI EvaluationComprehensive overview of AI evaluation methods spanning dangerous capability assessment, safety properties, and deception detection, with categorized frameworks from industry (Anthropic Constituti...Quality: 72/100
- Unsolved Problems in ML Safetyโ๐ paperโ โ โ โโarXivUnsolved Problems in ML SafetyDan Hendrycks, Nicholas Carlini, John Schulman et al. (2021)alignmentcapabilitiessafetyai-safety+1Source โ (2022) - Comprehensive taxonomy of safety challenges
- Measuring Mathematical Problem Solving With the MATH Datasetโ๐ paperโ โ โ โโarXivMATHDan Hendrycks, Collin Burns, Saurav Kadavath et al. (2021)capabilitieseconomiccomputellm+1Source โ (2021) - Standard benchmark for AI reasoning capabilities
Field-Building Impact
Grant Programs
| Program | Scale | Impact | Timeline |
|---|---|---|---|
| Compute Grants | $2M+ distributed | 100+ researchers supported | 2022-present |
| ML Safety Scholars | 50+ participants annually | Early-career pipeline development | 2021-present |
| Research Fellowships | $500K+ annually | 20+ fellows placed at top institutions | 2022-present |
| AI Safety Camp | 200+ participants total | International collaboration network | 2020-present |
Institutional Partnerships
- Academic Collaborations: UC Berkeley, MIT, Stanford, Oxford
- Industry Engagement: Research partnerships with AnthropicOrganizationAnthropicComprehensive profile of Anthropic, founded in 2021 by seven former OpenAI researchers (Dario and Daniela Amodei, Chris Olah, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish) with early funding..., Google DeepMind
- Policy Connections: Briefings for US Congress, UK Parliament, EU regulators
Statement on AI Risk (2023)
The May 2023 Statement on AI Riskโ๐ webโ โ โ โ โCenter for AI SafetyAI Risk Statementrisk-interactionscompounding-effectssystems-thinkingai-safety+1Source โ represented a watershed moment in AI safety advocacy, consisting of a single sentence:
"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."
Signatory Analysis
| Category | Notable Signatories | Significance |
|---|---|---|
| Turing Award Winners | Geoffrey HintonPersonGeoffrey HintonComprehensive biographical profile of Geoffrey Hinton documenting his 2023 shift from AI pioneer to safety advocate, estimating 10% extinction risk in 5-20 years. Covers his media strategy, policy ...Quality: 42/100, Yoshua BengioPersonYoshua BengioComprehensive biographical overview of Yoshua Bengio's transition from deep learning pioneer (Turing Award 2018) to AI safety advocate, documenting his 2020 pivot at Mila toward safety research, co...Quality: 39/100, Stuart RussellPersonStuart RussellStuart Russell is a UC Berkeley professor who founded CHAI in 2016 with $5.6M from Coefficient Giving (then Open Philanthropy) and authored 'Human Compatible' (2019), which proposes cooperative inv...Quality: 30/100 | Academic legitimacy |
| Industry Leaders | Sam AltmanPersonSam AltmanComprehensive biographical profile of Sam Altman documenting his role as OpenAI CEO, timeline predictions (AGI within presidential term, superintelligence in "few thousand days"), and controversies...Quality: 40/100 (OpenAI), Dario AmodeiPersonDario AmodeiComprehensive biographical profile of Anthropic CEO Dario Amodei documenting his 'race to the top' philosophy, 10-25% catastrophic risk estimate, 2026-2030 AGI timeline, and Constitutional AI appro...Quality: 41/100 (Anthropic), Demis HassabisPersonDemis HassabisComprehensive biographical profile of Demis Hassabis documenting his evolution from chess prodigy to DeepMind CEO, with detailed timeline of technical achievements (AlphaGo, AlphaFold, Gemini) and ...Quality: 45/100 (DeepMind) | Industry acknowledgment |
| Policy Experts | Helen TonerPersonHelen TonerComprehensive biographical profile of Helen Toner documenting her career from EA Melbourne founder to CSET Interim Executive Director, with detailed timeline of the November 2023 OpenAI board crisi...Quality: 43/100, Allan Dafoe, Gillian Hadfield | Governance credibility |
| Technical Researchers | 300+ ML/AI researchers | Scientific consensus |
The statement's impact included immediate media coverage across major outlets and influenced subsequent policy discussions, including mentions in UKOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100 and USOrganizationUS AI Safety InstituteThe US AI Safety Institute (AISI), established November 2023 within NIST with $10M budget (FY2025 request $82.7M), conducted pre-deployment evaluations of frontier models through MOUs with OpenAI a...Quality: 91/100 government AI strategies.
Current Trajectory & Timeline
Research Roadmap (2024-2026)
| Priority Area | 2024 Goals | 2025-2026 Projections |
|---|---|---|
| Representation Engineering | Scale to frontier models | Industry adoption for safety checks |
| Evaluation Frameworks | Comprehensive benchmark suite | Standard evaluation protocols |
| Alignment Methods | Proof-of-concept demonstrations | Practical implementation |
| Policy Research | Technical governance recommendations | Regulatory framework development |
Funding & Growth
- Current Budget: โ$5M annually (estimated)
- Researcher Count: 15+ full-time staff, 50+ affiliates
- Projected Growth: 2x expansion by 2025 based on field growth
Key Uncertainties & Research Cruxes
Technical Challenges
- Representation Engineering Scalability: Whether current methods work on frontier models remains unclear
- Benchmark Validity: Unknown if current evaluations capture real safety risks
- Alignment Verification: No consensus on how to verify successful alignment
Strategic Questions
- Research vs. Policy Balance: Optimal allocation between technical work and governance efforts
- Open vs. Closed Research: Tension between transparency and information hazards
- Timeline Assumptions: Disagreement on AGI timelinesConceptAGI TimelineComprehensive synthesis of AGI timeline forecasts showing dramatic acceleration: expert median dropped from 2061 (2018) to 2047 (2023), Metaculus from 50 years to 5 years since 2020, with current p...Quality: 59/100 affects research priorities
Leadership & Key Personnel
Key People
Sources & Resources
Official Resources
| Type | Resource | Description |
|---|---|---|
| Website | safe.aiโ๐ webโ โ โ โ โCenter for AI SafetyCAIS SurveysThe Center for AI Safety conducts technical and conceptual research to mitigate potential catastrophic risks from advanced AI systems. They take a comprehensive approach spannin...safetyx-risktalentfield-building+1Source โ | Main organization hub |
| Research | CAIS Publicationsโ๐ webโ โ โ โ โCenter for AI SafetyCAIS Publicationsai-safetyx-riskrepresentation-engineeringSource โ | Technical papers and reports |
| Blog | CAIS Blogโ๐ webโ โ โ โ โCenter for AI SafetyCAIS Blogai-safetyx-riskrepresentation-engineeringSource โ | Research updates and commentary |
| Courses | ML Safety Courseโ๐ webML Safety Coursesafetyai-safetyx-riskrepresentation-engineeringSource โ | Educational materials |
Key Research Papers
| Paper | Year | Citations | Impact |
|---|---|---|---|
| Unsolved Problems in ML Safetyโ๐ paperโ โ โ โโarXivUnsolved Problems in ML SafetyDan Hendrycks, Nicholas Carlini, John Schulman et al. (2021)alignmentcapabilitiessafetyai-safety+1Source โ | 2022 | 200+ | Research agenda setting |
| MACHIAVELLI Benchmarkโ๐ paperโ โ โ โโarXivMACHIAVELLI datasetAlexander Pan, Jun Shern Chan, Andy Zou et al. (2023)capabilitiessafetydeceptionevaluation+1Source โ | 2023 | 50+ | Industry evaluation adoption |
| Representation Engineeringโ๐ paperโ โ โ โโarXivRepresentation Engineering: A Top-Down Approach to AI TransparencyAndy Zou, Long Phan, Sarah Chen et al. (2023)interpretabilitysafetyllmai-safety+1Source โ | 2023 | 30+ | New research direction |
Related Organizations
- Research Alignment: MIRIOrganizationMachine Intelligence Research InstituteComprehensive organizational history documenting MIRI's trajectory from pioneering AI safety research (2000-2020) to policy advocacy after acknowledging research failure, with detailed financial da...Quality: 50/100, CHAIOrganizationCenter for Human-Compatible AICHAI is UC Berkeley's AI safety research center founded by Stuart Russell in 2016, pioneering cooperative inverse reinforcement learning and human-compatible AI frameworks. The center has trained 3...Quality: 37/100, Redwood ResearchOrganizationRedwood ResearchA nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark ali...Quality: 78/100
- Policy Focus: GovAIOrganizationGovAIGovAI is an AI policy research organization with ~15-20 staff, funded primarily by Coefficient Giving ($1.8M+ in 2023-2024), that has trained 100+ governance researchers through fellowships and cur...Quality: 43/100, RAND Corporationโ๐ webโ โ โ โ โRAND CorporationRAND: AI and National Securitycybersecurityagenticplanninggoal-stability+1Source โ
- Industry Labs: AnthropicOrganizationAnthropicComprehensive profile of Anthropic, founded in 2021 by seven former OpenAI researchers (Dario and Daniela Amodei, Chris Olah, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish) with early funding..., OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to commercial AGI developer, with detailed analysis of governance crisis, safety researcher exodus (75% of ..., DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100