Page StatusContent

Edited 7 weeks ago2.9k words2 backlinks

Updated quarterlyDue in 6 weeks

Summary

Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.

TODOs4

Complete 'Conceptual Framework' section

Complete 'Quantitative Analysis' section (8 placeholders)

Complete 'Strategic Importance' section

Complete 'Limitations' section (6 placeholders)

Capability Threshold Model

Model

AI Capability Threshold Model

LessWrong

Model TypeThreshold Analysis

ScopeCapability-risk mapping

Key InsightMany risks have threshold dynamics rather than gradual activation

Models

2.9k words · 2 backlinks

Model

AI Capability Threshold Model

LessWrong

Model TypeThreshold Analysis

ScopeCapability-risk mapping

Key InsightMany risks have threshold dynamics rather than gradual activation

Models

2.9k words · 2 backlinks

Overview

Different AI risks require different capability levels to become dangerous. A system that can write convincing phishing emails poses different risks than one that can autonomously discover zero-day vulnerabilities. This model maps specific capability requirements to specific risks, helping predict when risks activate as capabilities improve.

The capability threshold model provides a structured framework for understanding how AI systems transition from relatively benign to potentially dangerous across multiple risk domains. Rather than treating AI capability as a single dimension or risks as uniformly dependent on general intelligence, this model recognizes that specific risks emerge when systems cross particular capability thresholds in relevant dimensions. According to the International AI Safety Report (October 2025)↗, governance choices in 2025-2026 must internalize that capability scaling has decoupled from parameter count, meaning risk thresholds can be crossed between annual cycles.

Key findings include 15-25% benchmark performance indicating early risk emergence, 50% marking qualitative shifts to complex autonomous execution, and most critical thresholds estimated to cross between 2025-2029 across misuse, control, and structural risk categories. The Future of Life Institute's 2025 AI Safety Index↗ reveals an industry struggling to keep pace with its own rapid capability advances, with companies claiming AGI achievement within the decade yet none scoring above D in existential safety planning.

Risk Impact Assessment

Risk Category	Severity	Likelihood (2025-2027)	Threshold Crossing Timeline	Trend
Authentication Collapse	Critical	85%	2025-2027	↗ Accelerating
Mass Persuasion	High	70%	2025-2026	↗ Accelerating
Cyberweapon Development	High	65%	2025-2027	↗ Steady
Bioweapons Development	Critical	40%	2026-2029	→ Uncertain
Situational Awareness	Critical	60%	2025-2027	↗ Accelerating
Economic Displacement	High	80%	2026-2030	↗ Steady
Strategic Deception	Extreme	15%	2027-2035+	→ Uncertain

Capability Dimensions Framework

AI capabilities decompose into five distinct dimensions that progress at different rates. Understanding these separately is crucial because different risks require different combinations. According to Epoch AI's tracking↗, the training compute of frontier AI models has grown by 5x per year since 2020, and the Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024, from ~8 points/year to ~15 points/year.

Loading diagram...

Dimension	Level 1	Level 2	Level 3	Level 4	Current Frontier	Gap to Level 3
Domain Knowledge	Undergraduate	Graduate	Expert	Superhuman	Expert- (some domains)	0.5 levels
Reasoning Depth	Simple (2-3 steps)	Moderate (5-10)	Complex (20+)	Superhuman	Moderate+	0.5-1 level
Planning Horizon	Immediate	Short-term (hrs)	Medium (wks)	Long-term (months)	Short-term+	1 level
Strategic Modeling	None	Basic	Sophisticated	Superhuman	Basic+	1-1.5 levels
Autonomous Execution	None	Simple tasks	Complex tasks	Full autonomy	Simple-Complex	0.5-1 level

Domain Knowledge Benchmarks

Current measurement approaches show significant gaps in assessing practical domain expertise:

Domain	Best Benchmark	Current Frontier Score	Expert Human Level	Assessment Quality
Biology	MMLU-Biology↗	85-90%	≈95%	Medium
Chemistry	ChemBench↗	70-80%	≈90%	Low
Computer Security	SecBench↗	65-75%	≈85%	Low
Psychology	MMLU-Psychology	80-85%	≈90%	Very Low
Medicine	MedQA↗	85-90%	≈95%	Medium

Assessment quality reflects how well benchmarks capture practical expertise versus academic knowledge.

Reasoning Depth Progression

The ARC Prize 2024-2025 results↗ demonstrate the critical threshold zone for complex reasoning. On ARC-AGI-1, OpenAI's o3-preview achieved 75.7% accuracy (near human level of 98%), while on the harder ARC-AGI-2 benchmark, even advanced models score only single-digit percentages, yet humans can solve every task.

Reasoning Level	Benchmark Examples	Current Performance	Risk Relevance
Simple (2-3 steps)	Basic math word problems	95%+	Low-risk applications
Moderate (5-10 steps)	GSM8K↗, multi-hop QA	85-95%	Most current capabilities
Complex (20+ steps)	ARC-AGI↗, extended proofs	30-75% (ARC-AGI-1), 5-55% (ARC-AGI-2)	Critical threshold zone
Superhuman	Novel mathematical proofs	<10%	Advanced risks

Recent breakthrough (December 2025): Poetiq with GPT-5.2 X-High↗ achieved 75% on ARC-AGI-2, surpassing the average human test-taker score of 60% for the first time, demonstrating rapid progress on complex reasoning tasks.

Risk-Capability Mapping

Near-Term Risks (2025-2027)

Authentication Collapse

The volume of deepfakes has grown explosively: Deloitte's 2024 analysis↗ estimates growth from roughly 500,000 online deepfakes in 2023 to about 8 million in 2025, with annual growth nearing 900%. Voice cloning has crossed what experts call the "indistinguishable threshold"--a few seconds of audio now suffice to generate a convincing clone.

Capability	Required Level	Current Level	Gap	Evidence
Domain Knowledge (Media)	Expert	Expert-	0.5 level	Sora quality↗ approaching photorealism
Reasoning Depth	Moderate	Moderate	0 levels	Current models handle multi-step generation
Strategic Modeling	Basic+	Basic	0.5 level	Limited theory of mind in current systems
Autonomous Execution	Simple	Simple	0 levels	Already achieved for content generation

Key Threshold Capabilities:

Generate synthetic content indistinguishable from authentic across all modalities
Real-time interactive video generation (NVIDIA Omniverse↗)
Defeat detection systems designed to identify AI content
Mimic individual styles from minimal samples

Detection Challenges: OpenAI's deepfake detection tool↗ identifies DALL-E 3 images with 98.8% accuracy but only flags 5-10% of images from other AI tools. Multi-modal attacks combining deepfaked video, synthetic voices, and fabricated documents are increasing.

Current Status: OpenAI's Sora↗ and Meta's Make-A-Video↗ demonstrate near-threshold video generation. ElevenLabs↗ achieves voice cloning from <30 seconds of audio.

Mass Persuasion Capabilities

Capability	Required Level	Current Level	Gap	Evidence
Domain Knowledge (Psychology)	Graduate+	Graduate	0.5 level	Strong performance on psychology benchmarks
Strategic Modeling	Sophisticated	Basic+	1 level	Limited multi-agent reasoning
Planning Horizon	Medium-term	Short-term	1 level	Cannot maintain campaigns over weeks
Autonomous Execution	Simple	Simple	0 levels	Can generate content at scale

Research Evidence:

Anthropic (2024)↗ shows Claude 3 achieves 84% on psychology benchmarks
Stanford HAI study↗ finds AI-generated content 82% higher believability
MIT persuasion study↗ demonstrates automated A/B testing improves persuasion by 35%

Medium-Term Risks (2026-2029)

Bioweapons Development

Capability	Required Level	Current Level	Gap	Assessment Source
Domain Knowledge (Biology)	Expert	Graduate+	1 level	RAND biosecurity assessment↗
Domain Knowledge (Chemistry)	Expert	Graduate	1-2 levels	Limited synthesis knowledge
Reasoning Depth	Complex	Moderate+	1 level	Cannot handle 20+ step procedures
Planning Horizon	Medium-term	Short-term	1 level	No multi-week experimental planning
Autonomous Execution	Complex	Simple+	1 level	Cannot troubleshoot failed experiments

Critical Bottlenecks:

Specialized synthesis knowledge for dangerous compounds
Autonomous troubleshooting of complex laboratory procedures
Multi-week experimental planning and adaptation
Integration of theoretical knowledge with practical constraints

Expert Assessment: RAND Corporation (2024)↗ estimates 60% probability of crossing threshold by 2028.

Economic Displacement Thresholds

McKinsey's research↗ indicates that current technologies could automate about 57% of U.S. work hours in theory. By 2030, approximately 27% of current work hours in Europe and 30% in the United States could be automated. Workers in lower-wage jobs are up to 14 times more likely to need to change occupations than those in highest-wage positions.

Job Category	Automation Threshold	Current AI Capability	Estimated Timeline	Source
Content Writing	70% task automation	85%	Crossed 2024	McKinsey AI Index↗
Code Generation	60% task automation	60-70% (SWE-bench Verified)	Crossed 2025	SWE-bench leaderboard↗
Data Analysis	75% task automation	55%	2026-2027	Industry surveys
Customer Service	80% task automation	70%	2025-2026	Salesforce AI reports↗
Legal Research	65% task automation	40%	2027-2028	Legal industry analysis

Coding Benchmark Update: The International AI Safety Report (October 2025)↗ notes that coding capabilities have advanced particularly quickly. Top models now solve over 60% of problems in SWE-bench Verified, up from 40% in late 2024 and almost 0% at the beginning of 2024. However, Scale AI's SWE-Bench Pro↗ shows a significant performance drop: even the best models (GPT-5, Claude Opus 4.1) score only 23% on harder, more realistic tasks.

Long-Term Control Risks (2027-2035+)

Strategic Deception (Scheming)

Capability	Required Level	Current Level	Gap	Uncertainty
Strategic Modeling	Superhuman	Basic+	2+ levels	Very High
Reasoning Depth	Complex	Moderate+	1 level	High
Planning Horizon	Long-term	Short-term	2 levels	Very High
Situational Awareness	Expert	Basic	2 levels	High

Key Uncertainties:

Whether sophisticated strategic modeling can emerge from current training approaches
Detectability of strategic deception capabilities during evaluation
Minimum capability level required for effective scheming

Research Evidence:

Anthropic Constitutional AI↗ shows limited success in detecting deceptive behavior
Redwood Research↗ adversarial training reveals capabilities often hidden during evaluation

Current State & Trajectory

Capability Progress Rates

According to Epoch AI's analysis↗, training compute for frontier models grows 4-5x yearly. Their Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024. METR's research↗ shows AI performance on task length has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months.

Dimension	2023-2024 Progress	Projected 2024-2025	Key Drivers
Domain Knowledge	+0.5 levels	+0.3-0.7 levels	Larger training datasets, specialized fine-tuning
Reasoning Depth	+0.3 levels	+0.2-0.5 levels	Chain-of-thought improvements, tree search
Planning Horizon	+0.2 levels	+0.2-0.4 levels	Tool integration, memory systems
Strategic Modeling	+0.1 levels	+0.1-0.3 levels	Multi-agent training, RL improvements
Autonomous Execution	+0.4 levels	+0.3-0.6 levels	Tool use, real-world deployment

Data Sources: Epoch AI capability tracking↗, industry benchmark results, expert elicitation.

Compute Scaling Projections

Metric	Current (2025)	Projected 2027	Projected 2030	Source
Models above 10^26 FLOP	≈5-10	≈30	≈200+	Epoch AI model counts↗
Largest training run power	1-2 GW	2-4 GW	4-16 GW	Epoch AI power analysis↗
Frontier model training cost	$100M-500M	$100M-1B+	$1-5B	Epoch AI cost projections
Open-weight capability lag	6-12 months	6-12 months	6-12 months	Epoch AI consumer GPU analysis↗

Leading Organizations

Organization	Strongest Capabilities	Estimated Timeline to Next Threshold	Focus Area
OpenAI↗	Domain knowledge, autonomous execution	12-18 months	General capabilities
Anthropic↗	Reasoning depth, strategic modeling	18-24 months	Safety-focused development
DeepMind↗	Strategic modeling, planning	18-30 months	Scientific applications
Meta↗	Multimodal generation	6-12 months	Social/media applications

Key Uncertainties & Research Cruxes

Measurement Validity

The Berkeley CLTC Working Paper on Intolerable Risk Thresholds↗ notes that models effectively more capable than the latest tested model (4x or more in Effective Compute or 6 months worth of fine-tuning) require comprehensive assessment including threat model mapping, empirical capability tests, elicitation testing without safety mechanisms, and likelihood forecasting.

An interdisciplinary review of AI evaluation↗ highlights the "benchmark lottery" problem: researchers at Google's Brain Team found that many factors other than fundamental algorithmic superiority may lead to a method being perceived as superior. Ironically, a majority of influential benchmarks have been released without rigorous peer review.

Uncertainty	Impact if True	Impact if False	Current Evidence
Current benchmarks accurately measure risk-relevant capabilities	Can trust threshold predictions	Need fundamentally new evaluations	Mixed - good for some domains, poor for others
Practical capabilities match benchmark performance	Smooth transition from lab to deployment	Significant capability overhangs	Substantial gaps observed in real-world deployment
Capability improvements follow predictable scaling laws	Reliable timeline forecasting possible	Threshold crossings may surprise	Scaling laws hold for some capabilities, not others

Threshold Sharpness

Sharp Threshold Evidence:

Authentication systems↗: Detection accuracy drops from 95% to 15% once generation quality crosses threshold
Economic viability: McKinsey automation analysis↗ shows 10-20% capability improvements create 50-80% cost advantage in many tasks
Security vulnerabilities: Most exploits require complete capability to work at all

Gradual Scaling Evidence:

Job displacement: Different tasks within roles automate at different rates
Persuasion effectiveness: Incremental improvements in messaging quality yield incremental persuasion gains
Domain expertise: Knowledge accumulation appears continuous rather than threshold-based

Strategic Deception Detection

Critical unsolved problems in capability assessment:

Challenge	Current Approach	Limitation	Research Need
Detecting hidden capabilities	Red-teaming, adversarial prompts	Systems may hide abilities during evaluation	Novel evaluation paradigms
Measuring situational awareness	Direct questioning, inference tests	Pattern matching vs. genuine understanding	Causal intervention studies
Long-term planning assessment	Multi-step benchmarks	Maximum tested horizon ~hours	Extended evaluation infrastructure
Strategic modeling depth	Theory of mind tasks	Unclear if results generalize	Ecological validity research

Timeline Projections & Warning Indicators

2025 Critical Thresholds

High Probability (70%+):

Authentication Collapse: Real-time deepfakes become commercially viable
Content Generation Saturation: Human-level quality across all text/image modalities

Medium Probability (40-70%):

Code Automation: 50%+ of software engineering tasks automated
Basic Situational Awareness: Systems understand evaluation vs. deployment contexts

2026-2027 Medium Probability Events

Risk	Probability	Key Indicators to Monitor
Advanced Cyberweapons	65%	METR evaluations↗ crossing 40% threshold
Economic Disruption	80%	GitHub Copilot↗ achieving >60% task completion
Bioweapons Uplift	40%	Expert biology benchmarks >90%, synthesis knowledge tests
Oversight Evasion	60%	Detection of training/deployment context awareness

Early Warning System

Red Flag Indicators:

Sudden benchmark improvements >20 percentage points
Systems developing capabilities not explicitly trained for
Gap between capability and safety evaluation results widening
Evidence of strategic behavior during evaluation

Monitoring Infrastructure:

METR↗ dangerous capability evaluations
MIRI↗ alignment evaluation protocols
Industry responsible scaling policies (OpenAI Preparedness↗, Anthropic RSP↗)
Academic capability forecasting (Epoch AI↗)

The METR Common Elements Report (December 2025)↗ describes how each major AI developer's policy uses capability thresholds for biological weapons development, cyberattacks, autonomous replication, and automated AI R&D, with commitments to conduct model evaluations assessing whether models are approaching thresholds that could enable severe harm.

Expert Survey Findings

An OECD-affiliated survey on AI thresholds↗ found that experts agreed if training compute thresholds are exceeded, AI companies should:

Conduct additional risk assessments (e.g., via model evaluations)
Notify an independent public body (e.g., EU AI Office, FTC, or AI Safety Institute)
Notify the government

Participants noted that risk assessment frameworks from safety-critical industries (nuclear, maritime, aviation, healthcare, finance, space) provide valuable precedent for AI governance.

Sources & Resources

Primary Research

Source	Type	Key Findings	Relevance
Anthropic Responsible Scaling Policy↗	Industry Policy	Defines capability thresholds for safety measures	Framework implementation
OpenAI Preparedness Framework↗	Industry Policy	Risk assessment methodology	Threshold identification
METR Dangerous Capability Evaluations↗	Research	Systematic capability testing	Current capability baselines
Epoch AI Capability Forecasts↗	Research	Timeline predictions for AI milestones	Forecasting methodology

Government & Policy

Organization	Resource	Focus
NIST AI Risk Management Framework↗	US Government	Risk assessment standards
UK AISI Research↗	UK Government	Model evaluation protocols
EU AI Office↗	EU Government	Regulatory frameworks
RAND Corporation AI Studies↗	Think Tank	National security implications

Technical Benchmarks & Evaluation

Benchmark	Domain	Current Frontier Score (Dec 2025)	Threshold Relevance
MMLU↗	General Knowledge	85-90%	Domain expertise baseline
ARC-AGI-1↗	Abstract Reasoning	75-87% (o3-preview)	Complex reasoning threshold
ARC-AGI-2↗	Abstract Reasoning	54-75% (GPT-5.2)	Next-gen reasoning threshold
SWE-bench Verified↗	Software Engineering	60-70%	Autonomous code execution
SWE-bench Pro↗	Real-world Coding	17-23%	Generalization to novel code
MATH↗	Mathematical Reasoning	60-80%	Multi-step reasoning

Risk Assessment Research

Research Area	Key Papers	Organizations
Bioweapons Risk	RAND Biosecurity Assessment↗	RAND, Johns Hopkins CNAS
Economic Displacement	McKinsey AI Impact↗	McKinsey, Brookings Institution
Authentication Collapse	Deepfake Detection Challenges↗	UC Berkeley, MIT
Strategic Deception	Constitutional AI Research↗	Anthropic, Redwood Research

Additional Sources

Source	Type	Key Finding
International AI Safety Report (Oct 2025)↗	Government	Risk thresholds can be crossed between annual cycles due to post-training/inference advances
Future of Life Institute AI Safety Index 2025↗	NGO	Industry fundamentally unprepared; Anthropic leads (C+) but none score above D in existential safety
Berkeley CLTC Intolerable Risk Thresholds↗	Academic	Models 4x+ more capable require comprehensive risk assessment
METR Common Elements Report (Dec 2025)↗	Research	All major labs use capability thresholds for bio, cyber, replication, AI R&D
ARC Prize 2025 Results↗	Academic	First AI system (Poetiq/GPT-5.2) exceeds human average on ARC-AGI-2 reasoning
Epoch AI Compute Trends↗	Research	Training compute grows 4-5x yearly; capability improvement doubled in 2024

Capability Threshold Model

AI Capability Threshold Model

AI Capability Threshold Model

Overview

Risk Impact Assessment

Capability Dimensions Framework

Domain Knowledge Benchmarks

Reasoning Depth Progression

Risk-Capability Mapping

Near-Term Risks (2025-2027)

Authentication Collapse

Mass Persuasion Capabilities

Medium-Term Risks (2026-2029)

Bioweapons Development

Economic Displacement Thresholds

Long-Term Control Risks (2027-2035+)

Strategic Deception (Scheming)

Current State & Trajectory

Capability Progress Rates

Compute Scaling Projections

Leading Organizations

Key Uncertainties & Research Cruxes

Measurement Validity

Threshold Sharpness

Strategic Deception Detection

Timeline Projections & Warning Indicators

2025 Critical Thresholds

2026-2027 Medium Probability Events

Early Warning System

Expert Survey Findings

Sources & Resources

Primary Research

Government & Policy

Technical Benchmarks & Evaluation

Risk Assessment Research

Additional Sources

Related Pages

Top Related Pages

AI Risk Activation Timeline Model

AI Risk Warning Signs Model

Scheming Likelihood Assessment

E42

E282

Approaches

Analysis

People

Models

Concepts

Key Debates

Transition Model