Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.
Capability Threshold Model
AI Capability Threshold Model
Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.
AI Capability Threshold Model
Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.
Overview
Different AI risks require different capability levels to become dangerous. A system that can write convincing phishing emails poses different risks than one that can autonomously discover zero-day vulnerabilities. This model maps specific capability requirements to specific risks, helping predict when risks activate as capabilities improve.
The capability threshold model provides a structured framework for understanding how AI systems transition from relatively benign to potentially dangerous across multiple risk domains. Rather than treating AI capability as a single dimension or risks as uniformly dependent on general intelligence, this model recognizes that specific risks emerge when systems cross particular capability thresholds in relevant dimensions. According to the International AI Safety Report (October 2025)βπ webInternational AI Safety Report (October 2025)safetySource β, governance choices in 2025-2026 must internalize that capability scaling has decoupled from parameter count, meaning risk thresholds can be crossed between annual cycles.
Key findings include 15-25% benchmark performance indicating early risk emergence, 50% marking qualitative shifts to complex autonomous execution, and most critical thresholds estimated to cross between 2025-2029 across misuse, control, and structural risk categories. The Future of Life InstituteOrganizationFuture of Life Institute (FLI)Comprehensive profile of FLI documenting $25M+ in grants distributed (2015: $7M to 37 projects, 2021: $25M program), major public campaigns (Asilomar Principles with 5,700+ signatories, 2023 Pause ...Quality: 46/100's 2025 AI Safety Indexβπ webβ β β ββFuture of Life InstituteFLI AI Safety Index Summer 2025The FLI AI Safety Index Summer 2025 assesses leading AI companies' safety efforts, finding widespread inadequacies in risk management and existential safety planning. Anthropic ...safetyx-risktool-useagentic+1Source β reveals an industry struggling to keep pace with its own rapid capability advances, with companies claiming AGI achievement within the decade yet none scoring above D in existential safety planning.
Risk Impact Assessment
| Risk Category | Severity | Likelihood (2025-2027) | Threshold Crossing Timeline | Trend |
|---|---|---|---|---|
| Authentication CollapseRiskAuthentication CollapseComprehensive synthesis showing human deepfake detection has fallen to 24.5% for video and 55% overall (barely above chance), with AI detectors dropping from 90%+ to 60% on novel fakes. Economic im...Quality: 57/100 | Critical | 85% | 2025-2027 | β Accelerating |
| Mass Persuasion | High | 70% | 2025-2026 | β Accelerating |
| Cyberweapon Development | High | 65% | 2025-2027 | β Steady |
| BioweaponsRiskBioweapons RiskComprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities (0.3% β 1.5% annual epidemic probability), Anthro...Quality: 91/100 Development | Critical | 40% | 2026-2029 | β Uncertain |
| Situational AwarenessCapabilitySituational AwarenessComprehensive analysis of situational awareness in AI systems, documenting that Claude 3 Opus fakes alignment 12% baseline (78% post-RL), 5 of 6 frontier models demonstrate scheming capabilities, a...Quality: 67/100 | Critical | 60% | 2025-2027 | β Accelerating |
| Economic Displacement | High | 80% | 2026-2030 | β Steady |
| Strategic Deception | Extreme | 15% | 2027-2035+ | β Uncertain |
Capability Dimensions Framework
AI capabilities decompose into five distinct dimensions that progress at different rates. Understanding these separately is crucial because different risks require different combinations. According to Epoch AIOrganizationEpoch AIEpoch AI is a research organization dedicated to producing rigorous, data-driven forecasts and analysis about artificial intelligence progress, with particular focus on compute trends, training dat...'s trackingβπ webβ β β β βEpoch AIEpoch AIcapabilitythresholdrisk-assessmentSource β, the training compute of frontier AI models has grown by 5x per year since 2020, and the Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024, from ~8 points/year to ~15 points/year.
| Dimension | Level 1 | Level 2 | Level 3 | Level 4 | Current Frontier | Gap to Level 3 |
|---|---|---|---|---|---|---|
| Domain Knowledge | Undergraduate | Graduate | Expert | Superhuman | Expert- (some domains) | 0.5 levels |
| Reasoning Depth | Simple (2-3 steps) | Moderate (5-10) | Complex (20+) | Superhuman | Moderate+ | 0.5-1 level |
| Planning Horizon | Immediate | Short-term (hrs) | Medium (wks) | Long-term (months) | Short-term+ | 1 level |
| Strategic Modeling | None | Basic | Sophisticated | Superhuman | Basic+ | 1-1.5 levels |
| Autonomous Execution | None | Simple tasks | Complex tasks | Full autonomy | Simple-Complex | 0.5-1 level |
Domain Knowledge Benchmarks
Current measurement approaches show significant gaps in assessing practical domain expertise:
| Domain | Best Benchmark | Current Frontier Score | Expert Human Level | Assessment Quality |
|---|---|---|---|---|
| Biology | MMLU-Biologyβπ paperβ β β ββarXivHendrycks et al.Dan Hendrycks, Collin Burns, Steven Basart et al. (2020)capabilitiesevaluationcomputellm+1Source β | 85-90% | β95% | Medium |
| Chemistry | ChemBenchβπ paperβ β β ββarXivChemBenchLei Yao, Yong Zhang, Zilong Yan et al. (2023)capabilitiesllmcapabilitythreshold+1Source β | 70-80% | β90% | Low |
| Computer Security | SecBenchβπ webβ β β ββGitHubSecBenchcapabilitythresholdrisk-assessmentSource β | 65-75% | β85% | Low |
| Psychology | MMLU-Psychology | 80-85% | β90% | Very Low |
| Medicine | MedQAβπ paperβ β β ββarXivMedQADi Jin, Eileen Pan, Nassim Oufattole et al. (2020)capabilitythresholdrisk-assessmentSource β | 85-90% | β95% | Medium |
Assessment quality reflects how well benchmarks capture practical expertise versus academic knowledge.
Reasoning Depth Progression
The ARC Prize 2024-2025 resultsβπ webARC Prize 2024-2025 resultscapabilitythresholdrisk-assessmentSource β demonstrate the critical threshold zone for complex reasoning. On ARC-AGI-1, OpenAI's o3-preview achieved 75.7% accuracy (near human level of 98%), while on the harder ARC-AGI-2 benchmark, even advanced models score only single-digit percentages, yet humans can solve every task.
| Reasoning Level | Benchmark Examples | Current Performance | Risk Relevance |
|---|---|---|---|
| Simple (2-3 steps) | Basic math word problems | 95%+ | Low-risk applications |
| Moderate (5-10 steps) | GSM8Kβπ paperβ β β ββarXivGSM8KKarl Cobbe, Vineet Kosaraju, Mohammad Bavarian et al. (2021)capabilitiestrainingllmcapability+1Source β, multi-hop QA | 85-95% | Most current capabilities |
| Complex (20+ steps) | ARC-AGIβπ paperβ β β ββarXivOn the Measure of IntelligenceFranΓ§ois Chollet (2019)capabilitiestrainingevaluationcapability+1Source β, extended proofs | 30-75% (ARC-AGI-1), 5-55% (ARC-AGI-2) | Critical threshold zone |
| Superhuman | Novel mathematical proofs | <10% | Advanced risks |
Recent breakthrough (December 2025): Poetiq with GPT-5.2 X-Highβπ webARC Prize 2024-2025 resultscapabilitythresholdrisk-assessmentSource β achieved 75% on ARC-AGI-2, surpassing the average human test-taker score of 60% for the first time, demonstrating rapid progress on complex reasoning tasks.
Risk-Capability Mapping
Near-Term Risks (2025-2027)
Authentication Collapse
The volume of deepfakes has grown explosively: Deloitte's 2024 analysisβπ webDeloitte's 2024 analysiscapabilitythresholdrisk-assessmentSource β estimates growth from roughly 500,000 online deepfakes in 2023 to about 8 million in 2025, with annual growth nearing 900%. Voice cloning has crossed what experts call the "indistinguishable threshold"--a few seconds of audio now suffice to generate a convincing clone.
| Capability | Required Level | Current Level | Gap | Evidence |
|---|---|---|---|---|
| Domain Knowledge (Media) | Expert | Expert- | 0.5 level | Sora qualityβπ webβ β β β βOpenAISora qualitycapabilitythresholdrisk-assessmentSource β approaching photorealism |
| Reasoning Depth | Moderate | Moderate | 0 levels | Current models handle multi-step generation |
| Strategic Modeling | Basic+ | Basic | 0.5 level | Limited theory of mind in current systems |
| Autonomous Execution | Simple | Simple | 0 levels | Already achieved for content generation |
Key Threshold Capabilities:
- Generate synthetic content indistinguishable from authentic across all modalities
- Real-time interactive video generation (NVIDIA Omniverseβπ webNVIDIA Omniversecapabilitythresholdrisk-assessmentSource β)
- Defeat detection systems designed to identify AI content
- Mimic individual styles from minimal samples
Detection Challenges: OpenAI's deepfake detection toolβπ webOpenAI's deepfake detection toolcapabilitythresholdrisk-assessmentSource β identifies DALL-E 3 images with 98.8% accuracy but only flags 5-10% of images from other AI tools. Multi-modal attacks combining deepfaked video, synthetic voices, and fabricated documents are increasing.
Current Status: OpenAI's Soraβπ webβ β β β βOpenAISora qualitycapabilitythresholdrisk-assessmentSource β and Meta's Make-A-Videoβπ webMeta's Make-A-Videocapabilitythresholdrisk-assessmentSource β demonstrate near-threshold video generation. ElevenLabsβπ webElevenLabscapabilitythresholdrisk-assessmentsynthetic-media+1Source β achieves voice cloning from <30 seconds of audio.
Mass Persuasion Capabilities
| Capability | Required Level | Current Level | Gap | Evidence |
|---|---|---|---|---|
| Domain Knowledge (Psychology) | Graduate+ | Graduate | 0.5 level | Strong performance on psychology benchmarks |
| Strategic Modeling | Sophisticated | Basic+ | 1 level | Limited multi-agent reasoning |
| Planning Horizon | Medium-term | Short-term | 1 level | Cannot maintain campaigns over weeks |
| Autonomous Execution | Simple | Simple | 0 levels | Can generate content at scale |
Research Evidence:
- Anthropic (2024)βπ webβ β β β βAnthropicAnthropic (2024)capabilitythresholdrisk-assessmentSource β shows Claude 3 achieves 84% on psychology benchmarks
- Stanford HAI studyβπ webβ β β β βStanford HAIStanford HAI studycapabilitythresholdrisk-assessmentSource β finds AI-generated content 82% higher believability
- MIT persuasion studyβπ paperβ β β β β Science (peer-reviewed)MIT persuasion studyG. Spitale, N. Biller-Andorno, Federico Germani (2023)evaluationllmcapabilitythreshold+1Source β demonstrates automated A/B testing improves persuasion by 35%
Medium-Term Risks (2026-2029)
Bioweapons Development
| Capability | Required Level | Current Level | Gap | Assessment Source |
|---|---|---|---|---|
| Domain Knowledge (Biology) | Expert | Graduate+ | 1 level | RAND biosecurity assessmentβπ webβ β β β βRAND CorporationRAND Corporation studyprobabilitydecompositionbioweaponscapability+1Source β |
| Domain Knowledge (Chemistry) | Expert | Graduate | 1-2 levels | Limited synthesis knowledge |
| Reasoning Depth | Complex | Moderate+ | 1 level | Cannot handle 20+ step procedures |
| Planning Horizon | Medium-term | Short-term | 1 level | No multi-week experimental planning |
| Autonomous Execution | Complex | Simple+ | 1 level | Cannot troubleshoot failed experiments |
Critical Bottlenecks:
- Specialized synthesis knowledge for dangerous compounds
- Autonomous troubleshooting of complex laboratory procedures
- Multi-week experimental planning and adaptation
- Integration of theoretical knowledge with practical constraints
Expert Assessment: RAND Corporation (2024)βπ webβ β β β βRAND CorporationRAND Corporation studyprobabilitydecompositionbioweaponscapability+1Source β estimates 60% probability of crossing threshold by 2028.
Economic Displacement Thresholds
McKinsey's researchβπ webβ β β ββMcKinsey & CompanyMcKinsey finds 57%labor-marketsautomationinequalitySource β indicates that current technologies could automate about 57% of U.S. work hours in theory. By 2030, approximately 27% of current work hours in Europe and 30% in the United States could be automated. Workers in lower-wage jobs are up to 14 times more likely to need to change occupations than those in highest-wage positions.
| Job Category | Automation Threshold | Current AI Capability | Estimated Timeline | Source |
|---|---|---|---|---|
| Content Writing | 70% task automation | 85% | Crossed 2024 | McKinsey AI Indexβπ webβ β β ββMcKinsey & CompanyMcKinsey AI Indexcapabilitythresholdrisk-assessmentSource β |
| Code Generation | 60% task automation | 60-70% (SWE-bench Verified) | Crossed 2025 | SWE-bench leaderboardβπ webSWE-bench Official LeaderboardsSWE-bench provides a multi-variant evaluation platform for assessing AI models' performance in software engineering tasks. It offers different datasets and metrics to comprehens...capabilitiesevaluationtool-useagentic+1Source β |
| Data Analysis | 75% task automation | 55% | 2026-2027 | Industry surveys |
| Customer Service | 80% task automation | 70% | 2025-2026 | Salesforce AI reportsβπ webSalesforce AI reportscapabilitythresholdrisk-assessmentSource β |
| Legal Research | 65% task automation | 40% | 2027-2028 | Legal industry analysis |
Coding Benchmark Update: The International AI Safety Report (October 2025)βπ webInternational AI Safety Report (October 2025)safetySource β notes that coding capabilities have advanced particularly quickly. Top models now solve over 60% of problems in SWE-bench Verified, up from 40% in late 2024 and almost 0% at the beginning of 2024. However, Scale AI's SWE-Bench Proβπ webScale AI's SWE-Bench Procapabilitythresholdrisk-assessmentSource β shows a significant performance drop: even the best models (GPT-5, Claude Opus 4.1) score only 23% on harder, more realistic tasks.
Long-Term Control Risks (2027-2035+)
Strategic Deception (Scheming)
| Capability | Required Level | Current Level | Gap | Uncertainty |
|---|---|---|---|---|
| Strategic Modeling | Superhuman | Basic+ | 2+ levels | Very High |
| Reasoning Depth | Complex | Moderate+ | 1 level | High |
| Planning Horizon | Long-term | Short-term | 2 levels | Very High |
| Situational Awareness | Expert | Basic | 2 levels | High |
Key Uncertainties:
- Whether sophisticated strategic modeling can emerge from current training approaches
- Detectability of strategic deception capabilities during evaluation
- Minimum capability level required for effective scheming
Research Evidence:
- Anthropic Constitutional AIβπ paperβ β β ββarXivConstitutional AI: Harmlessness from AI FeedbackBai, Yuntao, Kadavath, Saurav, Kundu, Sandipan et al. (2022)foundation-modelstransformersscalingagentic+1Source β shows limited success in detecting deceptive behavior
- Redwood Researchβπ webRedwood Research: AI ControlA nonprofit research organization focusing on AI safety, Redwood Research investigates potential risks from advanced AI systems and develops protocols to detect and prevent inte...safetytalentfield-buildingcareer-transitions+1Source β adversarial training reveals capabilities often hidden during evaluation
Current State & Trajectory
Capability Progress Rates
According to Epoch AI's analysisβπ webβ β β β βEpoch AIEpoch AIcapabilitythresholdrisk-assessmentSource β, training compute for frontier models grows 4-5x yearly. Their Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024. METR's researchβπ webβ β β β βMETRMETR's researchcapabilitythresholdrisk-assessmentSource β shows AI performance on task length has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months.
| Dimension | 2023-2024 Progress | Projected 2024-2025 | Key Drivers |
|---|---|---|---|
| Domain Knowledge | +0.5 levels | +0.3-0.7 levels | Larger training datasets, specialized fine-tuning |
| Reasoning Depth | +0.3 levels | +0.2-0.5 levels | Chain-of-thought improvements, tree search |
| Planning Horizon | +0.2 levels | +0.2-0.4 levels | Tool integration, memory systems |
| Strategic Modeling | +0.1 levels | +0.1-0.3 levels | Multi-agent training, RL improvements |
| Autonomous Execution | +0.4 levels | +0.3-0.6 levels | Tool use, real-world deployment |
Data Sources: Epoch AI capability trackingβπ webβ β β β βEpoch AIEpoch AIEpoch AI provides comprehensive data and insights on AI model scaling, tracking computational performance, training compute, and model developments across various domains.capabilitiestrainingcomputeprioritization+1Source β, industry benchmark results, expert elicitation.
Compute Scaling Projections
| Metric | Current (2025) | Projected 2027 | Projected 2030 | Source |
|---|---|---|---|---|
| Models above 10^26 FLOP | β5-10 | β30 | β200+ | Epoch AI model countsβπ webβ β β β βEpoch AIEpoch AI projectionscapabilitythresholdrisk-assessmentSource β |
| Largest training run power | 1-2 GW | 2-4 GW | 4-16 GW | Epoch AI power analysisβπ webβ β β β βEpoch AIEpoch AI power analysiscapabilitythresholdrisk-assessmentSource β |
| Frontier model training cost | $100M-500M | $100M-1B+ | $1-5B | Epoch AI cost projections |
| Open-weight capability lag | 6-12 months | 6-12 months | 6-12 months | Epoch AI consumer GPU analysisβπ webβ β β β βEpoch AIEpoch AI consumer GPU analysiscomputeSource β |
Leading Organizations
| Organization | Strongest Capabilities | Estimated Timeline to Next Threshold | Focus Area |
|---|---|---|---|
| OpenAIβπ webβ β β β βOpenAIOpenAIfoundation-modelstransformersscalingtalent+1Source β | Domain knowledge, autonomous execution | 12-18 months | General capabilities |
| Anthropicβπ webβ β β β βAnthropicAnthropicfoundation-modelstransformersscalingescalation+1Source β | Reasoning depth, strategic modeling | 18-24 months | Safety-focused development |
| DeepMindβπ webβ β β β βGoogle DeepMindGoogle DeepMindcapabilitythresholdrisk-assessmentinterventions+1Source β | Strategic modeling, planning | 18-30 months | Scientific applications |
| Metaβπ webβ β β β βMeta AIPublic statements 2024capabilitythresholdrisk-assessmentgovernance+1Source β | Multimodal generation | 6-12 months | Social/media applications |
Key Uncertainties & Research Cruxes
Measurement Validity
The Berkeley CLTC Working Paper on Intolerable Risk Thresholdsβπ webBerkeley CLTC Working Paper on Intolerable Risk Thresholdscapabilitythresholdrisk-assessmentSource β notes that models effectively more capable than the latest tested model (4x or more in Effective Compute or 6 months worth of fine-tuning) require comprehensive assessment including threat model mapping, empirical capability tests, elicitation testing without safety mechanisms, and likelihood forecasting.
An interdisciplinary review of AI evaluationβπ paperβ β β ββarXivinterdisciplinary review of AI evaluationMaria Eriksson, Erasmo Purificato, Arman Noroozian et al. (2025)evaluationSource β highlights the "benchmark lottery" problem: researchers at Google's Brain Team found that many factors other than fundamental algorithmic superiority may lead to a method being perceived as superior. Ironically, a majority of influential benchmarks have been released without rigorous peer review.
| Uncertainty | Impact if True | Impact if False | Current Evidence |
|---|---|---|---|
| Current benchmarks accurately measure risk-relevant capabilities | Can trust threshold predictions | Need fundamentally new evaluations | Mixed - good for some domains, poor for others |
| Practical capabilities match benchmark performance | Smooth transition from lab to deployment | Significant capability overhangs | Substantial gaps observed in real-world deployment |
| Capability improvements follow predictable scaling laws | Reliable timeline forecasting possible | Threshold crossings may surprise | Scaling laws hold for some capabilities, not others |
Threshold Sharpness
Sharp Threshold Evidence:
- Authentication systemsβπ paperβ β β ββarXivAuthentication systemsHuseyin Fuat Alsan, Taner Arsan (2023)capabilitiestrainingeconomiccapability+1Source β: Detection accuracy drops from 95% to 15% once generation quality crosses threshold
- Economic viability: McKinsey automation analysisβπ webβ β β ββMcKinsey & CompanyMcKinsey AI Indexcapabilitythresholdrisk-assessmentSource β shows 10-20% capability improvements create 50-80% cost advantage in many tasks
- Security vulnerabilities: Most exploits require complete capability to work at all
Gradual Scaling Evidence:
- Job displacement: Different tasks within roles automate at different rates
- Persuasion effectiveness: Incremental improvements in messaging quality yield incremental persuasion gains
- Domain expertise: Knowledge accumulation appears continuous rather than threshold-based
Strategic Deception Detection
Critical unsolved problems in capability assessment:
| Challenge | Current Approach | Limitation | Research Need |
|---|---|---|---|
| Detecting hidden capabilities | Red-teaming, adversarial prompts | Systems may hide abilities during evaluation | Novel evaluation paradigms |
| Measuring situational awareness | Direct questioning, inference tests | Pattern matching vs. genuine understanding | Causal intervention studies |
| Long-term planning assessment | Multi-step benchmarks | Maximum tested horizon ~hours | Extended evaluation infrastructure |
| Strategic modeling depth | Theory of mind tasks | Unclear if results generalize | Ecological validity research |
Timeline Projections & Warning Indicators
2025 Critical Thresholds
High Probability (70%+):
- Authentication Collapse: Real-time deepfakes become commercially viable
- Content Generation Saturation: Human-level quality across all text/image modalities
Medium Probability (40-70%):
- Code Automation: 50%+ of software engineering tasks automated
- Basic Situational Awareness: Systems understand evaluation vs. deployment contexts
2026-2027 Medium Probability Events
| Risk | Probability | Key Indicators to Monitor |
|---|---|---|
| Advanced Cyberweapons | 65% | METR evaluationsβπ webβ β β β βMETRmetr.orgsoftware-engineeringcode-generationprogramming-aisocial-engineering+1Source β crossing 40% threshold |
| Economic Disruption | 80% | GitHub Copilotβπ webβ β β ββGitHubGitHub Copilotcapabilitythresholdrisk-assessmentSource β achieving >60% task completion |
| Bioweapons Uplift | 40% | Expert biology benchmarks >90%, synthesis knowledge tests |
| Oversight Evasion | 60% | Detection of training/deployment context awareness |
Early Warning System
Red Flag Indicators:
- Sudden benchmark improvements >20 percentage points
- Systems developing capabilities not explicitly trained for
- Gap between capability and safety evaluation results widening
- Evidence of strategic behavior during evaluation
Monitoring Infrastructure:
- METRβπ webβ β β β βMETRmetr.orgsoftware-engineeringcode-generationprogramming-aisocial-engineering+1Source β dangerous capability evaluations
- MIRIβπ webβ β β ββMIRImiri.orgsoftware-engineeringcode-generationprogramming-aiagentic+1Source β alignment evaluation protocols
- Industry responsible scaling policies (OpenAI Preparednessβπ webβ β β β βOpenAIOpenAI Preparednesscapabilitythresholdrisk-assessmentframework+1Source β, Anthropic RSPβπ webβ β β β βAnthropicResponsible Scaling Policygovernancecapabilitiestool-useagentic+1Source β)
- Academic capability forecasting (Epoch AIβπ webβ β β β βEpoch AIEpoch AIEpoch AI provides comprehensive data and insights on AI model scaling, tracking computational performance, training compute, and model developments across various domains.capabilitiestrainingcomputeprioritization+1Source β)
The METR Common Elements Report (December 2025)βπ webβ β β β βMETRMETR's analysis of 12 companiesevaluationsdangerous-capabilitiesautonomous-replicationSource β describes how each major AI developer's policy uses capability thresholds for biological weapons development, cyberattacks, autonomous replication, and automated AI R&D, with commitments to conduct model evaluations assessing whether models are approaching thresholds that could enable severe harm.
Expert Survey Findings
An OECD-affiliated survey on AI thresholdsβπ webOECD-affiliated survey on AI thresholdscapabilitythresholdrisk-assessmentSource β found that experts agreed if training compute thresholds are exceeded, AI companies should:
- Conduct additional risk assessments (e.g., via model evaluations)
- Notify an independent public body (e.g., EU AI Office, FTC, or AI Safety Institute)
- Notify the government
Participants noted that risk assessment frameworks from safety-critical industries (nuclear, maritime, aviation, healthcare, finance, space) provide valuable precedent for AI governance.
Sources & Resources
Primary Research
| Source | Type | Key Findings | Relevance |
|---|---|---|---|
| Anthropic Responsible Scaling Policyβπ webβ β β β βAnthropicResponsible Scaling Policygovernancecapabilitiestool-useagentic+1Source β | Industry Policy | Defines capability thresholds for safety measures | Framework implementation |
| OpenAI Preparedness Frameworkβπ webβ β β β βOpenAIOpenAI Preparednesscapabilitythresholdrisk-assessmentframework+1Source β | Industry Policy | Risk assessment methodology | Threshold identification |
| METR Dangerous Capability Evaluationsβπ webβ β β β βMETRmetr.orgsoftware-engineeringcode-generationprogramming-aisocial-engineering+1Source β | Research | Systematic capability testing | Current capability baselines |
| Epoch AI Capability Forecastsβπ webβ β β β βEpoch AIEpoch AIEpoch AI provides comprehensive data and insights on AI model scaling, tracking computational performance, training compute, and model developments across various domains.capabilitiestrainingcomputeprioritization+1Source β | Research | Timeline predictions for AI milestones | Forecasting methodology |
Government & Policy
| Organization | Resource | Focus |
|---|---|---|
| NIST AI Risk Management FrameworkβποΈ governmentβ β β β β NISTNIST AI Risk Management Frameworksoftware-engineeringcode-generationprogramming-aifoundation-models+1Source β | US Government | Risk assessment standards |
| UK AISI ResearchβποΈ governmentβ β β β βUK GovernmentUK AISIcapabilitythresholdrisk-assessmentgame-theory+1Source β | UK Government | Model evaluation protocols |
| EU AI Officeβπ webβ β β β βEuropean UnionEU AI Officecapabilitythresholdrisk-assessmentdefense+1Source β | EU Government | Regulatory frameworks |
| RAND Corporation AI Studiesβπ webβ β β β βRAND CorporationRAND: AI and National Securitycybersecurityagenticplanninggoal-stability+1Source β | Think Tank | National security implications |
Technical Benchmarks & Evaluation
| Benchmark | Domain | Current Frontier Score (Dec 2025) | Threshold Relevance |
|---|---|---|---|
| MMLUβπ paperβ β β ββarXivHendrycks et al.Dan Hendrycks, Collin Burns, Steven Basart et al. (2020)capabilitiesevaluationcomputellm+1Source β | General Knowledge | 85-90% | Domain expertise baseline |
| ARC-AGI-1βπ paperβ β β ββarXivOn the Measure of IntelligenceFranΓ§ois Chollet (2019)capabilitiestrainingevaluationcapability+1Source β | Abstract Reasoning | 75-87% (o3-preview) | Complex reasoning threshold |
| ARC-AGI-2βπ webARC-AGI-2agiSource β | Abstract Reasoning | 54-75% (GPT-5.2) | Next-gen reasoning threshold |
| SWE-bench Verifiedβπ webSWE-bench Official LeaderboardsSWE-bench provides a multi-variant evaluation platform for assessing AI models' performance in software engineering tasks. It offers different datasets and metrics to comprehens...capabilitiesevaluationtool-useagentic+1Source β | Software Engineering | 60-70% | Autonomous code execution |
| SWE-bench Proβπ webScale AI's SWE-Bench Procapabilitythresholdrisk-assessmentSource β | Real-world Coding | 17-23% | Generalization to novel code |
| MATHβπ paperβ β β ββarXivMATHDan Hendrycks, Collin Burns, Saurav Kadavath et al. (2021)capabilitieseconomiccomputellm+1Source β | Mathematical Reasoning | 60-80% | Multi-step reasoning |
Risk Assessment Research
| Research Area | Key Papers | Organizations |
|---|---|---|
| Bioweapons Risk | RAND Biosecurity Assessmentβπ webβ β β β βRAND CorporationRAND Corporation studyprobabilitydecompositionbioweaponscapability+1Source β | RAND, Johns Hopkins CNAS |
| Economic Displacement | McKinsey AI Impactβπ webβ β β ββMcKinsey & CompanyMcKinsey AI Indexcapabilitythresholdrisk-assessmentSource β | McKinsey, Brookings Institution |
| Authentication Collapse | Deepfake Detection Challengesβπ paperβ β β ββarXivAuthentication systemsHuseyin Fuat Alsan, Taner Arsan (2023)capabilitiestrainingeconomiccapability+1Source β | UC Berkeley, MIT |
| Strategic Deception | Constitutional AI Researchβπ paperβ β β ββarXivConstitutional AI: Harmlessness from AI FeedbackBai, Yuntao, Kadavath, Saurav, Kundu, Sandipan et al. (2022)foundation-modelstransformersscalingagentic+1Source β | Anthropic, Redwood Research |
Additional Sources
| Source | Type | Key Finding |
|---|---|---|
| International AI Safety Report (Oct 2025)βπ webInternational AI Safety Report (October 2025)safetySource β | Government | Risk thresholds can be crossed between annual cycles due to post-training/inference advances |
| Future of Life Institute AI Safety Index 2025βπ webβ β β ββFuture of Life InstituteFLI AI Safety Index Summer 2025The FLI AI Safety Index Summer 2025 assesses leading AI companies' safety efforts, finding widespread inadequacies in risk management and existential safety planning. Anthropic ...safetyx-risktool-useagentic+1Source β | NGO | Industry fundamentally unprepared; Anthropic leads (C+) but none score above D in existential safety |
| Berkeley CLTC Intolerable Risk Thresholdsβπ webBerkeley CLTC Working Paper on Intolerable Risk Thresholdscapabilitythresholdrisk-assessmentSource β | Academic | Models 4x+ more capable require comprehensive risk assessment |
| METR Common Elements Report (Dec 2025)βπ webβ β β β βMETRMETR's analysis of 12 companiesevaluationsdangerous-capabilitiesautonomous-replicationSource β | Research | All major labs use capability thresholds for bio, cyber, replication, AI R&D |
| ARC Prize 2025 Resultsβπ webARC Prize 2024-2025 resultscapabilitythresholdrisk-assessmentSource β | Academic | First AI system (Poetiq/GPT-5.2) exceeds human average on ARC-AGI-2 reasoning |
| Epoch AI Compute Trendsβπ webβ β β β βEpoch AIEpoch AIcapabilitythresholdrisk-assessmentSource β | Research | Training compute grows 4-5x yearly; capability improvement doubled in 2024 |