The Alignment Research Center (ARC) was founded in 2021 by Paul Christiano after his departure from OpenAI. ARC represents a distinctive approach to AI alignment: combining theoretical research on fundamental problems (like Eliciting Latent Knowledge) with practical evaluations of frontier models for dangerous capabilities.
Anthropic Core ViewsSafety AgendaAnthropic Core ViewsAnthropic allocates 15-25% of R&D (~$100-200M annually) to safety research including the world's largest interpretability team (40-60 researchers), while maintaining $5B+ revenue by 2025. Their RSP...Quality: 62/100
Approaches
Capability ElicitationApproachCapability ElicitationCapability elicitation—systematically discovering what AI models can actually do through scaffolding, prompting, and fine-tuning—reveals 2-10x performance gaps versus naive testing. METR finds AI a...Quality: 91/100AI AlignmentApproachAI AlignmentComprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) show 75%+ effectiveness on measurable safety metrics for existing systems but face critical scalabi...Quality: 91/100
Analysis
Deceptive Alignment Decomposition ModelAnalysisDeceptive Alignment Decomposition ModelDecomposes deceptive alignment probability into five multiplicative conditions (mesa-optimization, misalignment, awareness, deception, survival) yielding 0.5-24% overall risk with 5% central estima...Quality: 62/100Model Organisms of MisalignmentAnalysisModel Organisms of MisalignmentModel organisms of misalignment is a research agenda creating controlled AI systems exhibiting specific alignment failures as testbeds. Recent work achieves 99% coherence with 40% misalignment rate...Quality: 65/100
Policy
EU AI ActPolicyEU AI ActComprehensive overview of the EU AI Act's risk-based regulatory framework, particularly its two-tier approach to foundation models that distinguishes between standard and systemic risk AI systems. ...Quality: 55/100
Organizations
AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials ($380B valuation, $14B ARR at Series G growing to $19B by March 2026), safety research (Constitutional AI, mechanistic interpretability...Quality: 74/100
Risks
Deceptive AlignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100
Other
Scalable OversightResearch AreaScalable OversightProcess supervision achieves 78.2% accuracy on MATH benchmarks (vs 72.4% outcome-based) and is deployed in OpenAI's o1 models, while debate shows 60-80% accuracy on factual questions with +4% impro...Quality: 68/100AI ControlResearch AreaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100Sam Bankman-FriedPersonSam Bankman-FriedSam Bankman-Fried, convicted of fraud and sentenced to 25 years for misappropriating FTX customer funds, damaged both cryptocurrency markets and the effective altruism community, raising substantiv...Quality: 55/100ARC-AGIBenchmarkARC-AGIAbstraction and Reasoning Corpus — a benchmark of visual pattern recognition tasks designed to test fluid intelligence and novel reasoning.ARC-AGI-2BenchmarkARC-AGI-2Second iteration of the ARC benchmark with harder tasks, designed to remain challenging as AI capabilities improve.
Concepts
Ea Epistemic Failures In The Ftx EraEa Epistemic Failures In The Ftx EraThis page synthesizes post-FTX critiques of EA's epistemic and governance failures, identifying interlocking problems including donor hero-worship, funding concentration in volatile crypto assets, ...Quality: 84/100FTX Collapse: Lessons for EA Funding ResilienceConceptFTX Collapse: Lessons for EA Funding ResilienceThe November 2022 collapse of FTX resulted in approximately $160M in committed EA grants that were not disbursed, organizational restructuring across the ecosystem, and revealed structural vulnerab...Quality: 78/100Large Language ModelsCapabilityLarge Language ModelsComprehensive analysis of LLM capabilities showing rapid progress from GPT-2 (1.5B parameters, 2019) to GPT-5 and Gemini 2.5 (2025), with training costs growing 2.4x annually and projected to excee...Quality: 60/100
Key Debates
AI Accident Risk CruxesCruxAI Accident Risk CruxesComprehensive survey of AI safety researcher disagreements on accident risks, quantifying probability ranges for mesa-optimization (15-55%), deceptive alignment (15-50%), and P(doom) (5-35% median ...Quality: 67/100Why Alignment Might Be HardArgumentWhy Alignment Might Be HardA comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adversarial dynamics, with expert p(doom) estimates ra...Quality: 69/100Is AI Existential Risk Real?CruxIs AI Existential Risk Real?Covers the foundational AI x-risk debate across four core cruxes: instrumental convergence, warning sign availability, corrigibility achievability, and timeline urgency. Incorporates quantitative e...Quality: 12/100
Historical
Mainstream EraHistoricalMainstream EraComprehensive timeline of AI safety's transition from niche to mainstream (2020-present), documenting ChatGPT's unprecedented growth (100M users in 2 months), the OpenAI governance crisis, and firs...Quality: 42/100