MATS is a well-documented 12-week fellowship program that has successfully trained 213 AI safety researchers with strong career outcomes (80% in alignment work) and research impact (160+ publications, 8000+ citations). The program provides comprehensive support ($27k per scholar) and has produced notable alumni contributions to alignment research.
Representation EngineeringApproachRepresentation EngineeringRepresentation engineering enables behavior steering and deception detection by manipulating concept-level vectors in neural networks, achieving 80-95% success in controlled experiments for honesty...Quality: 72/100
Analysis
Short AI Timeline Policy ImplicationsAnalysisShort AI Timeline Policy ImplicationsAnalyzes how AI policy priorities shift under 1-5 year timelines to transformative AI, arguing that interventions requiring less than 2 years (lab safety practices, compute monitoring, emergency co...Quality: 62/100
Organizations
AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials ($380B valuation, $14B ARR at Series G growing to $19B by March 2026), safety research (Constitutional AI, mechanistic interpretability...Quality: 74/100OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to Public Benefit Corporation, with detailed analysis of governance crisis, 2024-2025 ownership restructuri...Quality: 62/100Apollo ResearchOrganizationApollo ResearchApollo Research demonstrated in December 2024 that all six tested frontier models (including o1, Claude 3.5 Sonnet, Gemini 1.5 Pro) engage in scheming behaviors, with o1 maintaining deception in ov...Quality: 58/100Survival and Flourishing FundOrganizationSurvival and Flourishing FundSFF distributed $141M since 2019 (primarily from Jaan Tallinn's ~$900M fortune), with the 2025 round totaling $34.33M (86% to AI safety). Uses unique S-process mechanism where 6-12 recommenders exp...Quality: 59/100Alignment Research CenterOrganizationAlignment Research CenterComprehensive reference page on ARC (Alignment Research Center), covering its evolution from a dual theory/evals organization to ARC Theory (3 permanent researchers) plus the METR spin-out (Decembe...Quality: 57/100Coefficient GivingOrganizationCoefficient GivingCoefficient Giving (formerly Open Philanthropy) has directed $4B+ in grants since 2014, including $336M to AI safety (~60% of external funding). The organization spent ~$50M on AI safety in 2024, w...Quality: 55/100
Concepts
Situational AwarenessCapabilitySituational AwarenessComprehensive analysis of situational awareness in AI systems, documenting that Claude 3 Opus fakes alignment 12% baseline (78% post-RL), 5 of 6 frontier models demonstrate scheming capabilities, a...Quality: 67/100Safety Orgs OverviewSafety Orgs OverviewA well-organized reference overview of ~20 AI safety organizations categorized by function (alignment research, policy, field-building), with a comparative budget/headcount table showing estimated ...Quality: 48/100
Other
Scalable OversightResearch AreaScalable OversightProcess supervision achieves 78.2% accuracy on MATH benchmarks (vs 72.4% outcome-based) and is deployed in OpenAI's o1 models, while debate shows 60-80% accuracy on factual questions with +4% impro...Quality: 68/100InterpretabilityResearch AreaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100Ajeya CotraPersonAjeya CotraAjeya Cotra is a member of technical staff at METR and former senior advisor at Coefficient Giving (formerly Open Philanthropy), where she led technical AI safety grantmaking including a $25M agent...Quality: 55/100Evan HubingerPersonEvan HubingerComprehensive biography of Evan Hubinger documenting his influential theoretical work on mesa-optimization/deceptive alignment (2019, 205+ citations) and empirical demonstrations at Anthropic showi...Quality: 43/100