METR (Model Evaluation and Threat Research), formerly known as ARC Evals, is an organization dedicated to evaluating frontier AI models for dangerous capabilities before deployment.
Anthropic Core ViewsSafety AgendaAnthropic Core ViewsAnthropic allocates 15-25% of R&D (~$100-200M annually) to safety research including the world's largest interpretability team (40-60 researchers), while maintaining $5B+ revenue by 2025. Their RSP...Quality: 62/100
Approaches
Dangerous Capability EvaluationsApproachDangerous Capability EvaluationsComprehensive synthesis showing dangerous capability evaluations are now standard practice (95%+ frontier models) but face critical limitations: AI capabilities double every 7 months while external...Quality: 64/100Responsible Scaling PoliciesApproachResponsible Scaling PoliciesComprehensive analysis of Responsible Scaling Policies showing 20 companies with published frameworks as of Dec 2025, with SaferAI grading major policies 1.9-2.2/5 for specificity. Evidence suggest...Quality: 62/100Sandboxing / ContainmentApproachSandboxing / ContainmentComprehensive analysis of AI sandboxing as defense-in-depth, synthesizing METR's 2025 evaluations (GPT-5 time horizon ~2h, capabilities doubling every 7 months), AI boxing experiments (60-70% escap...Quality: 91/100
Analysis
AI Compute Scaling MetricsAnalysisAI Compute Scaling MetricsAI training compute is growing at ~4-5× per year with algorithmic efficiency improving ~3× per year (halving effective compute cost every ~8 months), while the compute landscape is shifting toward ...Quality: 78/100AI Safety Intervention Effectiveness MatrixAnalysisAI Safety Intervention Effectiveness MatrixQuantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding ($400M+) flows to RLHF methods showing only 10-20% effectiveness agai...Quality: 73/100
Policy
Voluntary AI Safety CommitmentsPolicyVoluntary AI Safety CommitmentsComprehensive empirical analysis of voluntary AI safety commitments showing 53% mean compliance rate across 30 indicators (ranging from 13% for Apple to 83% for OpenAI), with strongest adoption in ...Quality: 91/100AI Safety Institutes (AISIs)PolicyAI Safety Institutes (AISIs)Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-deployment access to frontier models, but face critic...Quality: 69/100
Organizations
Alignment Research CenterOrganizationAlignment Research CenterComprehensive reference page on ARC (Alignment Research Center), covering its evolution from a dual theory/evals organization to ARC Theory (3 permanent researchers) plus the METR spin-out (Decembe...Quality: 57/100
Other
AI EvaluationsResearch AreaAI EvaluationsEvaluations and red-teaming reduce detectable dangerous capabilities by 30-50x when combined with training interventions (o3 covert actions: 13% → 0.4%), but face fundamental limitations against so...Quality: 72/100Beth BarnesPersonBeth BarnesAI safety researcher, founder of METR (formerly ARC Evals). Focus on evaluating dangerous AI capabilities.Scalable OversightResearch AreaScalable OversightProcess supervision achieves 78.2% accuracy on MATH benchmarks (vs 72.4% outcome-based) and is deployed in OpenAI's o1 models, while debate shows 60-80% accuracy on factual questions with +4% impro...Quality: 68/100
Risks
Reward HackingRiskReward HackingComprehensive analysis showing reward hacking occurs in 1-2% of OpenAI o3 task attempts, with 43x higher rates when scoring functions are visible. Mathematical proof establishes it's inevitable for...Quality: 91/100AI-Induced EnfeeblementRiskAI-Induced EnfeeblementDocuments the gradual risk of humanity losing critical capabilities through AI dependency. Key findings: GPS users show 23% navigation decline (Nature 2020), AI writes 46% of code with 4x more clon...Quality: 91/100
Concepts
AI TimelinesConceptAI TimelinesForecasts and debates about when transformative AI capabilities will be developedQuality: 95/100Existential Risk from AIConceptExistential Risk from AIHypotheses concerning risks from advanced AI systems that some researchers believe could result in human extinction or permanent global catastrophe, including institutional frameworks developed by ...Quality: 92/100Situational AwarenessCapabilitySituational AwarenessComprehensive analysis of situational awareness in AI systems, documenting that Claude 3 Opus fakes alignment 12% baseline (78% post-RL), 5 of 6 frontier models demonstrate scheming capabilities, a...Quality: 67/100Self-Improvement and Recursive EnhancementCapabilitySelf-Improvement and Recursive EnhancementComprehensive analysis of AI self-improvement from current AutoML systems (23% training speedups via AlphaEvolve) to theoretical intelligence explosion scenarios, with expert consensus at ~50% prob...Quality: 69/100
Key Debates
AI Safety Solution CruxesCruxAI Safety Solution CruxesA comprehensive structured mapping of AI safety solution uncertainties across technical, alignment, governance, and agentic domains, using probability-weighted crux frameworks with specific estimat...Quality: 65/100Technical AI Safety ResearchCruxTechnical AI Safety ResearchTechnical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent foundations, and robustness) with 500+ researchers and $...Quality: 66/100