Longterm Wiki
Updated 2025-12-28HistoryData
Page StatusContent
Edited 7 weeks ago2.9k words2 backlinks
72
QualityGood
82
ImportanceHigh
11
Structure11/15
2018300%14%
Updated quarterlyDue in 6 weeks
Summary

Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.

TODOs4
Complete 'Conceptual Framework' section
Complete 'Quantitative Analysis' section (8 placeholders)
Complete 'Strategic Importance' section
Complete 'Limitations' section (6 placeholders)

Capability Threshold Model

Model

AI Capability Threshold Model

Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.

Model TypeThreshold Analysis
ScopeCapability-risk mapping
Key InsightMany risks have threshold dynamics rather than gradual activation
Related
Models
AI Risk Activation Timeline ModelAI Risk Warning Signs ModelScheming Likelihood Assessment
2.9k words Β· 2 backlinks
Model

AI Capability Threshold Model

Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.

Model TypeThreshold Analysis
ScopeCapability-risk mapping
Key InsightMany risks have threshold dynamics rather than gradual activation
Related
Models
AI Risk Activation Timeline ModelAI Risk Warning Signs ModelScheming Likelihood Assessment
2.9k words Β· 2 backlinks

Overview

Different AI risks require different capability levels to become dangerous. A system that can write convincing phishing emails poses different risks than one that can autonomously discover zero-day vulnerabilities. This model maps specific capability requirements to specific risks, helping predict when risks activate as capabilities improve.

The capability threshold model provides a structured framework for understanding how AI systems transition from relatively benign to potentially dangerous across multiple risk domains. Rather than treating AI capability as a single dimension or risks as uniformly dependent on general intelligence, this model recognizes that specific risks emerge when systems cross particular capability thresholds in relevant dimensions. According to the International AI Safety Report (October 2025)β†—, governance choices in 2025-2026 must internalize that capability scaling has decoupled from parameter count, meaning risk thresholds can be crossed between annual cycles.

Key findings include 15-25% benchmark performance indicating early risk emergence, 50% marking qualitative shifts to complex autonomous execution, and most critical thresholds estimated to cross between 2025-2029 across misuse, control, and structural risk categories. The Future of Life Institute's 2025 AI Safety Index↗ reveals an industry struggling to keep pace with its own rapid capability advances, with companies claiming AGI achievement within the decade yet none scoring above D in existential safety planning.

Risk Impact Assessment

Risk CategorySeverityLikelihood (2025-2027)Threshold Crossing TimelineTrend
Authentication CollapseCritical85%2025-2027β†— Accelerating
Mass PersuasionHigh70%2025-2026β†— Accelerating
Cyberweapon DevelopmentHigh65%2025-2027β†— Steady
Bioweapons DevelopmentCritical40%2026-2029β†’ Uncertain
Situational AwarenessCritical60%2025-2027β†— Accelerating
Economic DisplacementHigh80%2026-2030β†— Steady
Strategic DeceptionExtreme15%2027-2035+β†’ Uncertain

Capability Dimensions Framework

AI capabilities decompose into five distinct dimensions that progress at different rates. Understanding these separately is crucial because different risks require different combinations. According to Epoch AI's tracking↗, the training compute of frontier AI models has grown by 5x per year since 2020, and the Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024, from ~8 points/year to ~15 points/year.

Loading diagram...
DimensionLevel 1Level 2Level 3Level 4Current FrontierGap to Level 3
Domain KnowledgeUndergraduateGraduateExpertSuperhumanExpert- (some domains)0.5 levels
Reasoning DepthSimple (2-3 steps)Moderate (5-10)Complex (20+)SuperhumanModerate+0.5-1 level
Planning HorizonImmediateShort-term (hrs)Medium (wks)Long-term (months)Short-term+1 level
Strategic ModelingNoneBasicSophisticatedSuperhumanBasic+1-1.5 levels
Autonomous ExecutionNoneSimple tasksComplex tasksFull autonomySimple-Complex0.5-1 level

Domain Knowledge Benchmarks

Current measurement approaches show significant gaps in assessing practical domain expertise:

DomainBest BenchmarkCurrent Frontier ScoreExpert Human LevelAssessment Quality
BiologyMMLU-Biologyβ†—85-90%β‰ˆ95%Medium
ChemistryChemBenchβ†—70-80%β‰ˆ90%Low
Computer SecuritySecBenchβ†—65-75%β‰ˆ85%Low
PsychologyMMLU-Psychology80-85%β‰ˆ90%Very Low
MedicineMedQAβ†—85-90%β‰ˆ95%Medium

Assessment quality reflects how well benchmarks capture practical expertise versus academic knowledge.

Reasoning Depth Progression

The ARC Prize 2024-2025 results↗ demonstrate the critical threshold zone for complex reasoning. On ARC-AGI-1, OpenAI's o3-preview achieved 75.7% accuracy (near human level of 98%), while on the harder ARC-AGI-2 benchmark, even advanced models score only single-digit percentages, yet humans can solve every task.

Reasoning LevelBenchmark ExamplesCurrent PerformanceRisk Relevance
Simple (2-3 steps)Basic math word problems95%+Low-risk applications
Moderate (5-10 steps)GSM8K↗, multi-hop QA85-95%Most current capabilities
Complex (20+ steps)ARC-AGI↗, extended proofs30-75% (ARC-AGI-1), 5-55% (ARC-AGI-2)Critical threshold zone
SuperhumanNovel mathematical proofs<10%Advanced risks

Recent breakthrough (December 2025): Poetiq with GPT-5.2 X-High↗ achieved 75% on ARC-AGI-2, surpassing the average human test-taker score of 60% for the first time, demonstrating rapid progress on complex reasoning tasks.

Risk-Capability Mapping

Near-Term Risks (2025-2027)

Authentication Collapse

The volume of deepfakes has grown explosively: Deloitte's 2024 analysis↗ estimates growth from roughly 500,000 online deepfakes in 2023 to about 8 million in 2025, with annual growth nearing 900%. Voice cloning has crossed what experts call the "indistinguishable threshold"--a few seconds of audio now suffice to generate a convincing clone.

CapabilityRequired LevelCurrent LevelGapEvidence
Domain Knowledge (Media)ExpertExpert-0.5 levelSora quality↗ approaching photorealism
Reasoning DepthModerateModerate0 levelsCurrent models handle multi-step generation
Strategic ModelingBasic+Basic0.5 levelLimited theory of mind in current systems
Autonomous ExecutionSimpleSimple0 levelsAlready achieved for content generation

Key Threshold Capabilities:

  • Generate synthetic content indistinguishable from authentic across all modalities
  • Real-time interactive video generation (NVIDIA Omniverseβ†—)
  • Defeat detection systems designed to identify AI content
  • Mimic individual styles from minimal samples

Detection Challenges: OpenAI's deepfake detection tool↗ identifies DALL-E 3 images with 98.8% accuracy but only flags 5-10% of images from other AI tools. Multi-modal attacks combining deepfaked video, synthetic voices, and fabricated documents are increasing.

Current Status: OpenAI's Sora↗ and Meta's Make-A-Video↗ demonstrate near-threshold video generation. ElevenLabs↗ achieves voice cloning from <30 seconds of audio.

Mass Persuasion Capabilities

CapabilityRequired LevelCurrent LevelGapEvidence
Domain Knowledge (Psychology)Graduate+Graduate0.5 levelStrong performance on psychology benchmarks
Strategic ModelingSophisticatedBasic+1 levelLimited multi-agent reasoning
Planning HorizonMedium-termShort-term1 levelCannot maintain campaigns over weeks
Autonomous ExecutionSimpleSimple0 levelsCan generate content at scale

Research Evidence:

  • Anthropic (2024)β†— shows Claude 3 achieves 84% on psychology benchmarks
  • Stanford HAI studyβ†— finds AI-generated content 82% higher believability
  • MIT persuasion studyβ†— demonstrates automated A/B testing improves persuasion by 35%

Medium-Term Risks (2026-2029)

Bioweapons Development

CapabilityRequired LevelCurrent LevelGapAssessment Source
Domain Knowledge (Biology)ExpertGraduate+1 levelRAND biosecurity assessment↗
Domain Knowledge (Chemistry)ExpertGraduate1-2 levelsLimited synthesis knowledge
Reasoning DepthComplexModerate+1 levelCannot handle 20+ step procedures
Planning HorizonMedium-termShort-term1 levelNo multi-week experimental planning
Autonomous ExecutionComplexSimple+1 levelCannot troubleshoot failed experiments

Critical Bottlenecks:

  • Specialized synthesis knowledge for dangerous compounds
  • Autonomous troubleshooting of complex laboratory procedures
  • Multi-week experimental planning and adaptation
  • Integration of theoretical knowledge with practical constraints

Expert Assessment: RAND Corporation (2024)β†— estimates 60% probability of crossing threshold by 2028.

Economic Displacement Thresholds

McKinsey's research↗ indicates that current technologies could automate about 57% of U.S. work hours in theory. By 2030, approximately 27% of current work hours in Europe and 30% in the United States could be automated. Workers in lower-wage jobs are up to 14 times more likely to need to change occupations than those in highest-wage positions.

Job CategoryAutomation ThresholdCurrent AI CapabilityEstimated TimelineSource
Content Writing70% task automation85%Crossed 2024McKinsey AI Index↗
Code Generation60% task automation60-70% (SWE-bench Verified)Crossed 2025SWE-bench leaderboard↗
Data Analysis75% task automation55%2026-2027Industry surveys
Customer Service80% task automation70%2025-2026Salesforce AI reports↗
Legal Research65% task automation40%2027-2028Legal industry analysis

Coding Benchmark Update: The International AI Safety Report (October 2025)↗ notes that coding capabilities have advanced particularly quickly. Top models now solve over 60% of problems in SWE-bench Verified, up from 40% in late 2024 and almost 0% at the beginning of 2024. However, Scale AI's SWE-Bench Pro↗ shows a significant performance drop: even the best models (GPT-5, Claude Opus 4.1) score only 23% on harder, more realistic tasks.

Long-Term Control Risks (2027-2035+)

Strategic Deception (Scheming)

CapabilityRequired LevelCurrent LevelGapUncertainty
Strategic ModelingSuperhumanBasic+2+ levelsVery High
Reasoning DepthComplexModerate+1 levelHigh
Planning HorizonLong-termShort-term2 levelsVery High
Situational AwarenessExpertBasic2 levelsHigh

Key Uncertainties:

  • Whether sophisticated strategic modeling can emerge from current training approaches
  • Detectability of strategic deception capabilities during evaluation
  • Minimum capability level required for effective scheming

Research Evidence:

  • Anthropic Constitutional AIβ†— shows limited success in detecting deceptive behavior
  • Redwood Researchβ†— adversarial training reveals capabilities often hidden during evaluation

Current State & Trajectory

Capability Progress Rates

According to Epoch AI's analysis↗, training compute for frontier models grows 4-5x yearly. Their Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024. METR's research↗ shows AI performance on task length has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months.

Dimension2023-2024 ProgressProjected 2024-2025Key Drivers
Domain Knowledge+0.5 levels+0.3-0.7 levelsLarger training datasets, specialized fine-tuning
Reasoning Depth+0.3 levels+0.2-0.5 levelsChain-of-thought improvements, tree search
Planning Horizon+0.2 levels+0.2-0.4 levelsTool integration, memory systems
Strategic Modeling+0.1 levels+0.1-0.3 levelsMulti-agent training, RL improvements
Autonomous Execution+0.4 levels+0.3-0.6 levelsTool use, real-world deployment

Data Sources: Epoch AI capability tracking↗, industry benchmark results, expert elicitation.

Compute Scaling Projections

MetricCurrent (2025)Projected 2027Projected 2030Source
Models above 10^26 FLOPβ‰ˆ5-10β‰ˆ30β‰ˆ200+Epoch AI model countsβ†—
Largest training run power1-2 GW2-4 GW4-16 GWEpoch AI power analysis↗
Frontier model training cost$100M-500M$100M-1B+$1-5BEpoch AI cost projections
Open-weight capability lag6-12 months6-12 months6-12 monthsEpoch AI consumer GPU analysis↗

Leading Organizations

OrganizationStrongest CapabilitiesEstimated Timeline to Next ThresholdFocus Area
OpenAI↗Domain knowledge, autonomous execution12-18 monthsGeneral capabilities
Anthropic↗Reasoning depth, strategic modeling18-24 monthsSafety-focused development
DeepMind↗Strategic modeling, planning18-30 monthsScientific applications
Meta↗Multimodal generation6-12 monthsSocial/media applications

Key Uncertainties & Research Cruxes

Measurement Validity

The Berkeley CLTC Working Paper on Intolerable Risk Thresholds↗ notes that models effectively more capable than the latest tested model (4x or more in Effective Compute or 6 months worth of fine-tuning) require comprehensive assessment including threat model mapping, empirical capability tests, elicitation testing without safety mechanisms, and likelihood forecasting.

An interdisciplinary review of AI evaluation↗ highlights the "benchmark lottery" problem: researchers at Google's Brain Team found that many factors other than fundamental algorithmic superiority may lead to a method being perceived as superior. Ironically, a majority of influential benchmarks have been released without rigorous peer review.

UncertaintyImpact if TrueImpact if FalseCurrent Evidence
Current benchmarks accurately measure risk-relevant capabilitiesCan trust threshold predictionsNeed fundamentally new evaluationsMixed - good for some domains, poor for others
Practical capabilities match benchmark performanceSmooth transition from lab to deploymentSignificant capability overhangsSubstantial gaps observed in real-world deployment
Capability improvements follow predictable scaling lawsReliable timeline forecasting possibleThreshold crossings may surpriseScaling laws hold for some capabilities, not others

Threshold Sharpness

Sharp Threshold Evidence:

  • Authentication systemsβ†—: Detection accuracy drops from 95% to 15% once generation quality crosses threshold
  • Economic viability: McKinsey automation analysisβ†— shows 10-20% capability improvements create 50-80% cost advantage in many tasks
  • Security vulnerabilities: Most exploits require complete capability to work at all

Gradual Scaling Evidence:

  • Job displacement: Different tasks within roles automate at different rates
  • Persuasion effectiveness: Incremental improvements in messaging quality yield incremental persuasion gains
  • Domain expertise: Knowledge accumulation appears continuous rather than threshold-based

Strategic Deception Detection

Critical unsolved problems in capability assessment:

ChallengeCurrent ApproachLimitationResearch Need
Detecting hidden capabilitiesRed-teaming, adversarial promptsSystems may hide abilities during evaluationNovel evaluation paradigms
Measuring situational awarenessDirect questioning, inference testsPattern matching vs. genuine understandingCausal intervention studies
Long-term planning assessmentMulti-step benchmarksMaximum tested horizon ~hoursExtended evaluation infrastructure
Strategic modeling depthTheory of mind tasksUnclear if results generalizeEcological validity research

Timeline Projections & Warning Indicators

2025 Critical Thresholds

High Probability (70%+):

  • Authentication Collapse: Real-time deepfakes become commercially viable
  • Content Generation Saturation: Human-level quality across all text/image modalities

Medium Probability (40-70%):

  • Code Automation: 50%+ of software engineering tasks automated
  • Basic Situational Awareness: Systems understand evaluation vs. deployment contexts

2026-2027 Medium Probability Events

RiskProbabilityKey Indicators to Monitor
Advanced Cyberweapons65%METR evaluations↗ crossing 40% threshold
Economic Disruption80%GitHub Copilot↗ achieving >60% task completion
Bioweapons Uplift40%Expert biology benchmarks >90%, synthesis knowledge tests
Oversight Evasion60%Detection of training/deployment context awareness

Early Warning System

Red Flag Indicators:

  • Sudden benchmark improvements >20 percentage points
  • Systems developing capabilities not explicitly trained for
  • Gap between capability and safety evaluation results widening
  • Evidence of strategic behavior during evaluation

Monitoring Infrastructure:

  • METRβ†— dangerous capability evaluations
  • MIRIβ†— alignment evaluation protocols
  • Industry responsible scaling policies (OpenAI Preparednessβ†—, Anthropic RSPβ†—)
  • Academic capability forecasting (Epoch AIβ†—)

The METR Common Elements Report (December 2025)β†— describes how each major AI developer's policy uses capability thresholds for biological weapons development, cyberattacks, autonomous replication, and automated AI R&D, with commitments to conduct model evaluations assessing whether models are approaching thresholds that could enable severe harm.

Expert Survey Findings

An OECD-affiliated survey on AI thresholds↗ found that experts agreed if training compute thresholds are exceeded, AI companies should:

  • Conduct additional risk assessments (e.g., via model evaluations)
  • Notify an independent public body (e.g., EU AI Office, FTC, or AI Safety Institute)
  • Notify the government

Participants noted that risk assessment frameworks from safety-critical industries (nuclear, maritime, aviation, healthcare, finance, space) provide valuable precedent for AI governance.

Sources & Resources

Primary Research

SourceTypeKey FindingsRelevance
Anthropic Responsible Scaling Policy↗Industry PolicyDefines capability thresholds for safety measuresFramework implementation
OpenAI Preparedness Framework↗Industry PolicyRisk assessment methodologyThreshold identification
METR Dangerous Capability Evaluations↗ResearchSystematic capability testingCurrent capability baselines
Epoch AI Capability Forecasts↗ResearchTimeline predictions for AI milestonesForecasting methodology

Government & Policy

OrganizationResourceFocus
NIST AI Risk Management Framework↗US GovernmentRisk assessment standards
UK AISI Research↗UK GovernmentModel evaluation protocols
EU AI Office↗EU GovernmentRegulatory frameworks
RAND Corporation AI Studies↗Think TankNational security implications

Technical Benchmarks & Evaluation

BenchmarkDomainCurrent Frontier Score (Dec 2025)Threshold Relevance
MMLU↗General Knowledge85-90%Domain expertise baseline
ARC-AGI-1β†—Abstract Reasoning75-87% (o3-preview)Complex reasoning threshold
ARC-AGI-2β†—Abstract Reasoning54-75% (GPT-5.2)Next-gen reasoning threshold
SWE-bench Verified↗Software Engineering60-70%Autonomous code execution
SWE-bench Pro↗Real-world Coding17-23%Generalization to novel code
MATH↗Mathematical Reasoning60-80%Multi-step reasoning

Risk Assessment Research

Research AreaKey PapersOrganizations
Bioweapons RiskRAND Biosecurity Assessment↗RAND, Johns Hopkins CNAS
Economic DisplacementMcKinsey AI Impact↗McKinsey, Brookings Institution
Authentication CollapseDeepfake Detection Challenges↗UC Berkeley, MIT
Strategic DeceptionConstitutional AI Research↗Anthropic, Redwood Research

Additional Sources

SourceTypeKey Finding
International AI Safety Report (Oct 2025)β†—GovernmentRisk thresholds can be crossed between annual cycles due to post-training/inference advances
Future of Life Institute AI Safety Index 2025β†—NGOIndustry fundamentally unprepared; Anthropic leads (C+) but none score above D in existential safety
Berkeley CLTC Intolerable Risk Thresholds↗AcademicModels 4x+ more capable require comprehensive risk assessment
METR Common Elements Report (Dec 2025)β†—ResearchAll major labs use capability thresholds for bio, cyber, replication, AI R&D
ARC Prize 2025 Results↗AcademicFirst AI system (Poetiq/GPT-5.2) exceeds human average on ARC-AGI-2 reasoning
Epoch AI Compute Trends↗ResearchTraining compute grows 4-5x yearly; capability improvement doubled in 2024

Related Pages

Top Related Pages

Approaches

Prediction Markets (AI Forecasting)AI-Augmented ForecastingDangerous Capability EvaluationsAI Evaluation

Analysis

AGI Development

People

Philip Tetlock (Forecasting Pioneer)Toby OrdEli Lifland

Models

Epistemic Collapse Threshold ModelAI-Bioweapons Timeline Model

Concepts

Epoch AIFuture of Life Institute (FLI)Authentication CollapseAGI TimelineLarge Language ModelsAutonomous Coding

Key Debates

AI Risk Critical Uncertainties ModelAI Misuse Risk Cruxes

Transition Model

Alignment ProgressAI Capabilities