The Center for Human-Compatible AI (CHAI) is an academic research center at UC Berkeley focused on ensuring AI systems are beneficial to humans. Founded by Stuart Russell, author of the leading AI textbook, CHAI brings academic rigor to AI safety research.
AI AlignmentApproachAI AlignmentComprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) show 75%+ effectiveness on measurable safety metrics for existing systems but face critical scalabi...Quality: 91/100Cooperative IRL (CIRL)ApproachCooperative IRL (CIRL)CIRL is a theoretical framework where AI systems maintain uncertainty about human preferences, which naturally incentivizes corrigibility and deference. Despite elegant theory with formal proofs, t...Quality: 65/100Constitutional AIApproachConstitutional AIConstitutional AI is Anthropic's methodology using explicit principles and AI-generated feedback (RLAIF) to train safer models, achieving 3-10x improvements in harmlessness while maintaining helpfu...Quality: 70/100AI EvaluationApproachAI EvaluationComprehensive overview of AI evaluation methods spanning dangerous capability assessment, safety properties, and deception detection, with categorized frameworks from industry (Anthropic Constituti...Quality: 72/100AI Safety Training ProgramsApproachAI Safety Training ProgramsComprehensive guide to AI safety training programs including MATS (78% alumni in alignment work, 100+ scholars annually), Anthropic Fellows ($2,100/week stipend, 40%+ hired full-time), LASR Labs (5...Quality: 70/100
Analysis
AI Compute Scaling MetricsAnalysisAI Compute Scaling MetricsAI training compute is growing at ~4-5× per year with algorithmic efficiency improving ~3× per year (halving effective compute cost every ~8 months), while the compute landscape is shifting toward ...Quality: 78/100AI Safety Intervention Effectiveness MatrixAnalysisAI Safety Intervention Effectiveness MatrixQuantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding ($400M+) flows to RLHF methods showing only 10-20% effectiveness agai...Quality: 73/100AI Safety Research Allocation ModelAnalysisAI Safety Research Allocation ModelAnalysis finds AI safety research suffers 30-50% efficiency losses from industry dominance (60-70% of ~$700M annually), with critical areas like multi-agent dynamics and corrigibility receiving 3-5...Quality: 65/100AI Risk Interaction MatrixAnalysisAI Risk Interaction MatrixSystematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing portfolio risk to be 2-3x higher than linear estima...Quality: 65/100AI Safety Researcher Gap ModelAnalysisAI Safety Researcher Gap ModelQuantifies AI safety talent shortage: current 300-800 unfilled positions (30-50% gap) with training pipelines producing only 220-450 researchers annually when 500-1,500 are needed. Projects gaps co...Quality: 67/100Goal Misgeneralization Probability ModelAnalysisGoal Misgeneralization Probability ModelQuantitative framework estimating goal misgeneralization probability from 3.6% (superficial distribution shift) to 27.7% (extreme shift), with modifiers for specification quality (0.5x-2.0x), capab...Quality: 61/100
Organizations
Center for AI SafetyOrganizationCenter for AI SafetyCAIS is a nonprofit research organization founded by Dan Hendrycks that has distributed compute grants to researchers, published technical AI safety papers including the representation engineering ...Quality: 42/100AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials ($380B valuation, $14B ARR at Series G growing to $19B by March 2026), safety research (Constitutional AI, mechanistic interpretability...Quality: 74/100Center for Applied RationalityOrganizationCenter for Applied RationalityBerkeley nonprofit founded 2012 teaching applied rationality through workshops ($3,900 for 4.5 days), trained 1,300+ alumni reporting 9.2/10 satisfaction and 0.17σ life satisfaction increase at 1-y...Quality: 62/100OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to Public Benefit Corporation, with detailed analysis of governance crisis, 2024-2025 ownership restructuri...Quality: 62/100
Other
Vipul NaikPersonVipul NaikVipul Naik is a mathematician and EA community member who has funded ~$255K in contract research (primarily to Sebastian Sanchez and Issa Rice) and created the Donations List Website tracking $72.8...Quality: 63/100
Risks
Corrigibility FailureRiskCorrigibility FailureCorrigibility failure—AI systems resisting shutdown or modification—represents a foundational AI safety problem with empirical evidence now emerging: Anthropic found Claude 3 Opus engaged in alignm...Quality: 62/100
Concepts
Safety Orgs OverviewSafety Orgs OverviewA well-organized reference overview of ~20 AI safety organizations categorized by function (alignment research, policy, field-building), with a comparative budget/headcount table showing estimated ...Quality: 48/100
Historical
Deep Learning Revolution EraHistoricalDeep Learning Revolution EraComprehensive timeline documenting 2012-2020 AI capability breakthroughs (AlexNet, AlphaGo, GPT-3) and parallel safety field development, with quantified metrics showing capabilities funding outpac...Quality: 44/100