Longterm Wiki
Updated 2025-12-24HistoryData
Page StatusContent
Edited 7 weeks ago824 words6 backlinks
42
QualityAdequate
42
ImportanceReference
10
Structure10/15
704200%20%
Updated every 3 weeksOverdue by 30 days
Summary

CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.

Issues1
QualityRated 42 but structure suggests 67 (underrated by 25 points)

CAIS (Center for AI Safety)

Safety Org

Center for AI Safety

CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.

TypeSafety Org
Founded2022
LocationSan Francisco, CA
Websitesafe.ai
Related
Risks
Existential Risk from AIPower-Seeking AI
Organizations
Anthropic
824 words ยท 6 backlinks
Safety Org

Center for AI Safety

CAIS is a research organization that has distributed $2M+ in compute grants to 200+ researchers, published 50+ safety papers including benchmarks adopted by Anthropic/OpenAI, and organized the May 2023 AI extinction risk statement signed by 350+ AI leaders. Current budget is ~$5M annually with 15+ full-time staff, focusing on representation engineering, safety benchmarks, and field-building.

TypeSafety Org
Founded2022
LocationSan Francisco, CA
Websitesafe.ai
Related
Risks
Existential Risk from AIPower-Seeking AI
Organizations
Anthropic
824 words ยท 6 backlinks

Overview

The Center for AI Safety (CAIS)โ†— is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication efforts. Founded by Dan Hendrycks, CAIS gained widespread recognition for organizing the landmark "Statement on AI Risk" in May 2023, which received signatures from over 350 AI researchers and industry leaders.

CAIS's multi-pronged approach combines cutting-edge technical research on AI alignment and robustness with strategic field-building efforts that have supported over 200 researchers through grants and fellowships. The organization's work spans from fundamental research on representation engineeringโ†— to developing critical safety benchmarks like the MACHIAVELLI datasetโ†— for evaluating deceptive AI behavior.

Risk Assessment

Risk CategoryAssessmentEvidenceMitigation Focus
Technical Research ImpactHigh50+ safety publications, novel benchmarksRepresentation engineeringโ†—, adversarial robustness
Field-Building InfluenceVery High200+ researchers supported, $1M+ distributedCompute grants, fellowship programs
Policy CommunicationHighStatement signed by major AI leadersPublic awareness, expert consensus building
Timeline RelevanceMedium-HighResearch targets near-term safety challenges2-5 year research horizon

Key Research Areas

Technical Safety Research

Research DomainKey ContributionsImpact Metrics
Representation EngineeringMethods for reading/steering model internals15+ citationsโ†— within 6 months
Safety BenchmarksMACHIAVELLI, power-seeking evaluationsAdopted by Anthropicโ†—, OpenAIโ†—
Adversarial RobustnessNovel defense mechanisms, evaluation protocols100+ citations on key papers
Alignment FoundationsConceptual frameworks for AI safetyInfluenced alignment research directions

Major Publications & Tools

Field-Building Impact

Grant Programs

ProgramScaleImpactTimeline
Compute Grants$2M+ distributed100+ researchers supported2022-present
ML Safety Scholars50+ participants annuallyEarly-career pipeline development2021-present
Research Fellowships$500K+ annually20+ fellows placed at top institutions2022-present
AI Safety Camp200+ participants totalInternational collaboration network2020-present

Institutional Partnerships

  • Academic Collaborations: UC Berkeley, MIT, Stanford, Oxford
  • Industry Engagement: Research partnerships with Anthropic, Google DeepMind
  • Policy Connections: Briefings for US Congress, UK Parliament, EU regulators

Statement on AI Risk (2023)

The May 2023 Statement on AI Riskโ†— represented a watershed moment in AI safety advocacy, consisting of a single sentence:

"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

Signatory Analysis

CategoryNotable SignatoriesSignificance
Turing Award WinnersGeoffrey Hinton, Yoshua Bengio, Stuart RussellAcademic legitimacy
Industry LeadersSam Altman (OpenAI), Dario Amodei (Anthropic), Demis Hassabis (DeepMind)Industry acknowledgment
Policy ExpertsHelen Toner, Allan Dafoe, Gillian HadfieldGovernance credibility
Technical Researchers300+ ML/AI researchersScientific consensus

The statement's impact included immediate media coverage across major outlets and influenced subsequent policy discussions, including mentions in UK and US government AI strategies.

Current Trajectory & Timeline

Research Roadmap (2024-2026)

Priority Area2024 Goals2025-2026 Projections
Representation EngineeringScale to frontier modelsIndustry adoption for safety checks
Evaluation FrameworksComprehensive benchmark suiteStandard evaluation protocols
Alignment MethodsProof-of-concept demonstrationsPractical implementation
Policy ResearchTechnical governance recommendationsRegulatory framework development

Funding & Growth

  • Current Budget: โ‰ˆ$5M annually (estimated)
  • Researcher Count: 15+ full-time staff, 50+ affiliates
  • Projected Growth: 2x expansion by 2025 based on field growth

Key Uncertainties & Research Cruxes

Technical Challenges

  • Representation Engineering Scalability: Whether current methods work on frontier models remains unclear
  • Benchmark Validity: Unknown if current evaluations capture real safety risks
  • Alignment Verification: No consensus on how to verify successful alignment

Strategic Questions

  • Research vs. Policy Balance: Optimal allocation between technical work and governance efforts
  • Open vs. Closed Research: Tension between transparency and information hazards
  • Timeline Assumptions: Disagreement on AGI timelines affects research priorities

Leadership & Key Personnel

Key People

Dan Hendrycks
Executive Director
UC Berkeley PhD, former Google Brain
Mantas Mazeika
Research Director
University of Chicago, adversarial ML expert
Thomas Woodside
Policy Director
Former congressional staffer, tech policy
Andy Zou
Research Scientist
CMU, jailbreaking and red-teaming research

Sources & Resources

Official Resources

TypeResourceDescription
Websitesafe.aiโ†—Main organization hub
ResearchCAIS Publicationsโ†—Technical papers and reports
BlogCAIS Blogโ†—Research updates and commentary
CoursesML Safety Courseโ†—Educational materials

Key Research Papers

PaperYearCitationsImpact
Unsolved Problems in ML Safetyโ†—2022200+Research agenda setting
MACHIAVELLI Benchmarkโ†—202350+Industry evaluation adoption
Representation Engineeringโ†—202330+New research direction

Related Organizations

  • Research Alignment: MIRI, CHAI, Redwood Research
  • Policy Focus: GovAI, RAND Corporationโ†—
  • Industry Labs: Anthropic, OpenAI, DeepMind

Related Pages

Top Related Pages

People

Dan HendrycksYoshua Bengio

Labs

AnthropicFAR AI

Safety Research

Anthropic Core Views

Concepts

Existential Risk from AIAnthropicOpenAIMachine Intelligence Research InstituteUK AI Safety InstituteSelf-Improvement and Recursive Enhancement

Risks

AI-Induced Irreversibility

Models

Capabilities-to-Safety Pipeline ModelAI Safety Researcher Gap Model

Organizations

US AI Safety InstituteARC Evaluations

Key Debates

AI Safety Field Building and Community

Transition Model

Existential CatastropheAI Takeover