Nonprofit organization investigating offensive AI capabilities and controllability of frontier AI models through empirical research on autonomous hacking, shutdown resistance, and agentic misalignment.
Capability ElicitationApproachCapability ElicitationCapability elicitation—systematically discovering what AI models can actually do through scaffolding, prompting, and fine-tuning—reveals 2-10x performance gaps versus naive testing. METR finds AI a...Quality: 91/100
Policy
Voluntary AI Safety CommitmentsPolicyVoluntary AI Safety CommitmentsComprehensive empirical analysis of voluntary AI safety commitments showing 53% mean compliance rate across 30 indicators (ranging from 13% for Apple to 83% for OpenAI), with strongest adoption in ...Quality: 91/100
Other
Jaan TallinnPersonJaan TallinnProfile of Jaan Tallinn documenting $150M+ lifetime AI safety giving (86% of $51M in 2024), primarily through SFF ($34.33M distributed in a 2025 grant round). Co-founded CSER (2012) and FLI (2014),...Quality: 53/100Elon MuskPersonElon MuskComprehensive profile of Elon Musk's role in AI, documenting his early safety warnings (2014-2017), OpenAI founding and contentious departure, xAI launch and funding history, Neuralink BCI developm...Quality: 38/100Benjamin Weinstein-RaunPersonBenjamin Weinstein-RaunSenior Researcher and acting director at Palisade Research, focused on AI safety.Red TeamingResearch AreaRed TeamingRed teaming is a systematic adversarial evaluation methodology for identifying AI vulnerabilities and dangerous capabilities before deployment, with effectiveness rates varying from 10-80% dependin...Quality: 65/100
Organizations
Redwood ResearchOrganizationRedwood ResearchA nonprofit AI safety and security research organization founded in 2021, known for pioneering AI Control research, developing causal scrubbing interpretability methods, and conducting landmark ali...Quality: 78/100Center for Applied RationalityOrganizationCenter for Applied RationalityBerkeley nonprofit founded 2012 teaching applied rationality through workshops ($3,900 for 4.5 days), trained 1,300+ alumni reporting 9.2/10 satisfaction and 0.17σ life satisfaction increase at 1-y...Quality: 62/100OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to Public Benefit Corporation, with detailed analysis of governance crisis, 2024-2025 ownership restructuri...Quality: 62/100AI ImpactsOrganizationAI ImpactsAI Impacts is a research organization that conducts empirical analysis of AI timelines and risks through surveys and historical trend analysis, contributing valuable data to AI safety discourse. Wh...Quality: 53/100Lionheart VenturesOrganizationLionheart VenturesLionheart Ventures is a small venture capital firm ($25M inaugural fund) focused on AI safety and mental health investments, notable for its investment in Anthropic and integration with the EA comm...Quality: 50/100Machine Intelligence Research InstituteOrganizationMachine Intelligence Research InstituteComprehensive organizational history documenting MIRI's trajectory from pioneering AI safety research (2000-2020) to policy advocacy after acknowledging research failure, with detailed financial da...Quality: 50/100
Key Debates
Why Alignment Might Be HardArgumentWhy Alignment Might Be HardA comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adversarial dynamics, with expert p(doom) estimates ra...Quality: 69/100Technical AI Safety ResearchCruxTechnical AI Safety ResearchTechnical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent foundations, and robustness) with 500+ researchers and $...Quality: 66/100
Concepts
Agentic AICapabilityAgentic AIAnalysis of agentic AI capabilities and deployment challenges, documenting industry forecasts (40% of enterprise apps by 2026, $199B market by 2034) alongside implementation difficulties (40%+ proj...Quality: 68/100Safety Orgs OverviewSafety Orgs OverviewA well-organized reference overview of ~20 AI safety organizations categorized by function (alignment research, policy, field-building), with a comparative budget/headcount table showing estimated ...Quality: 48/100
Risks
Power-Seeking AIRiskPower-Seeking AIFormal proofs demonstrate optimal policies seek power in MDPs (Turner et al. 2021), now empirically validated: OpenAI o3 sabotaged shutdown in 79% of tests (Palisade 2025), and Claude 3 Opus showed...Quality: 67/100AI ProliferationRiskAI ProliferationAI proliferation accelerated dramatically as the capability gap narrowed from 18 to 6 months (2022-2024), with open-source models like DeepSeek R1 now matching frontier performance. US export contr...Quality: 60/100AI DisinformationRiskAI DisinformationPost-2024 analysis shows AI disinformation had limited immediate electoral impact (cheap fakes used 7x more than AI content), but creates concerning long-term epistemic erosion with 82% higher beli...Quality: 54/100