Longterm Wiki
Updated 2025-12-28HistoryData
Page StatusContent
Edited 7 weeks ago1.1k words4 backlinks
62
QualityGood
82
ImportanceHigh
10
Structure10/15
1003700%5%
Updated quarterlyDue in 6 weeks
Summary

Quantifies the capability-alignment race showing capabilities currently ~3 years ahead of alignment readiness, with gap widening at 0.5 years/year driven by 10²⁶ FLOP scaling vs. 15% interpretability coverage and 30% scalable oversight maturity. Projects gap reaching 5-7 years by 2030 unless alignment research funding increases from $200M to $800M annually, with 60% chance of warning shot before TAI potentially triggering governance response.

TODOs4
Complete 'Conceptual Framework' section
Complete 'Quantitative Analysis' section (8 placeholders)
Complete 'Strategic Importance' section
Complete 'Limitations' section (6 placeholders)

Capability-Alignment Race Model

Analysis

Capability-Alignment Race Model

Quantifies the capability-alignment race showing capabilities currently ~3 years ahead of alignment readiness, with gap widening at 0.5 years/year driven by 10²⁶ FLOP scaling vs. 15% interpretability coverage and 30% scalable oversight maturity. Projects gap reaching 5-7 years by 2030 unless alignment research funding increases from $200M to $800M annually, with 60% chance of warning shot before TAI potentially triggering governance response.

Related
Safety Agendas
Scalable Oversight
Organizations
AnthropicEpoch AI
People
Paul Christiano
Concepts
AI Development Racing Dynamics
1.1k words · 4 backlinks

Overview

The Capability-Alignment Race Model quantifies the fundamental dynamic determining AI safety: the gap between advancing capabilities and our readiness to safely deploy them. Current analysis shows capabilities ~3 years ahead of alignment readiness, with this gap widening at 0.5 years annually.

The model tracks how frontier compute (currently 10²⁶ FLOP for largest training runs) and algorithmic improvements drive capability progress at ~10-15 percentage points per year, while alignment research (interpretability at ~15% behavior coverage—though less than 5% of frontier model computations are mechanistically understood—and scalable oversight at ~30% maturity) advances more slowly. This creates deployment pressure worth $100B annually, racing against governance systems operating at ~25% effectiveness.

List View
Computing layout...
Legend
Node Types
Leaf Nodes
Causes
Intermediate
Effects
Arrow Strength
Strong
Medium
Weak

Risk Assessment

FactorSeverityLikelihoodTimelineTrend
Gap widens to 5+ yearsCatastrophic50%2027-2030Accelerating
Alignment breakthroughsCritical (positive)20%2025-2027Uncertain
Governance catches upHigh (positive)25%2026-2028Slow
Warning shots trigger responseMedium (positive)60%2025-2027Increasing

Key Dynamics & Evidence

Capability Acceleration

ComponentCurrent StateGrowth Rate2027 ProjectionSource
Training compute10²⁶ FLOP4x/year10²⁸ FLOPEpoch AI
Algorithmic efficiency2x 2024 baseline1.5x/year3.4x baselineErdil & Besiroglu (2023)
Performance (MMLU)89%+8pp/year>95%Anthropic
Frontier lab lead6 monthsStable3-6 monthsRAND

Alignment Lag

ComponentCurrent CoverageImprovement Rate2027 ProjectionCritical Gap
Interpretability (behavior coverage)15%+5pp/year30%Need 80% for safety
Scalable oversight30%+8pp/year54%Need 90% for superhuman
Deception detection20%+3pp/year29%Need 95% for AGI
Alignment tax15% loss-2pp/year9% lossTarget <5% for adoption

Deployment Pressure

Economic value drives rapid deployment, creating misalignment between safety needs and market incentives.

Pressure SourceCurrent ImpactAnnual Growth2027 ImpactMitigation
Economic value$500B/year40%$1.5T/yearRegulation, liability
Military competition0.6/1.0 intensityIncreasing0.8/1.0Arms control treaties
Lab competition6 month leadShortening3 month leadIndustry coordination

Quote from Paul Christiano: "The core challenge is that capabilities are advancing faster than our ability to align them. If this gap continues to widen, we'll be in serious trouble."

Current State & Trajectory

2025 Snapshot

The race is in a critical phase with capabilities accelerating faster than alignment solutions:

  • Frontier models approaching human-level performance (70% expert-level)
  • Alignment research still in early stages with limited coverage
  • Governance systems lagging significantly behind technical progress
  • Economic incentives strongly favor rapid deployment over safety

5-Year Projections

MetricCurrent20272030Risk Level
Capability-alignment gap3 years4-5 years5-7 yearsCritical
Deployment pressure0.7/1.00.85/1.00.9/1.0High
Governance strength0.25/1.00.4/1.00.6/1.0Improving
Warning shot probability15%/year20%/year25%/yearIncreasing

Based on Metaculus forecasts and expert surveys from AI Impacts.

Potential Turning Points

Critical junctures that could alter trajectories:

  • Major alignment breakthrough (20% chance by 2027): Interpretability or oversight advance that halves the gap
  • Capability plateau (15% chance): Scaling laws break down, slowing capability progress
  • Coordinated pause (10% chance): International agreement to pause frontier development
  • Warning shot incident (60% chance by 2027): Serious but recoverable AI accident that triggers policy response

Key Uncertainties & Research Cruxes

Technical Uncertainties

QuestionCurrent EvidenceExpert ConsensusImplications
Can interpretability scale to frontier models?Limited success on smaller models45% optimisticDetermines alignment feasibility
Will scaling laws continue?Some evidence of slowdown70% continue to 2027Core driver of capability timeline
How much alignment tax is acceptable?Currently 15%Target <5%Adoption vs. safety tradeoff

Governance Questions

  • Regulatory capture: Will AI labs co-opt government oversight? CNAS analysis suggests 40% risk
  • International coordination: Can major powers cooperate on AI safety? RAND assessment shows limited progress
  • Democratic response: Will public concern drive effective policy? Polling shows growing awareness but uncertain translation to action

Strategic Cruxes

Core disagreements among experts on alignment difficulty:

  1. Technical optimism: 35% believe alignment will prove tractable
  2. Governance solution: 25% think coordination/pause is the path forward
  3. Warning shots help: 60% expect helpful wake-up calls before catastrophe
  4. Timeline matters: 80% agree slower development improves outcomes

Timeline of Critical Events

PeriodCapability MilestonesAlignment ProgressGovernance Developments
2025GPT-5 level, 80% human tasksBasic interpretability toolsEU AI Act implementation
2026Multimodal AGI claimsScalable oversight demosUS federal AI legislation
2027Superhuman in most domainsAlignment tax <10%International AI treaty
2028Recursive self-improvementDeception detection toolsCompute governance regime
2030Transformative AI deploymentMature alignment stackGlobal coordination framework

Based on Metaculus community predictions and Future of Humanity Institute surveys.

Resource Requirements & Strategic Investments

Priority Funding Areas

Analysis suggests optimal resource allocation to narrow the gap:

Investment AreaCurrent FundingRecommendedGap ReductionROI
Alignment research$200M/year$800M/year0.8 yearsHigh
Interpretability$50M/year$300M/year0.3 yearsVery high
Governance capacity$100M/year$400M/yearIndirect (time)Medium
Coordination/pause$30M/year$200M/yearVariableHigh if successful

Key Organizations & Initiatives

Leading efforts to address the capability-alignment gap:

OrganizationFocusAnnual BudgetApproach
AnthropicConstitutional AI$500MConstitutional training
DeepMindAlignment team$100MScalable oversight
MIRIAgent foundations$15MTheoretical foundations
ARCAlignment research$20MEmpirical alignment

Related Models & Cross-References

This model connects to several other risk analyses:

  • Racing Dynamics: How competition accelerates capability development
  • Multipolar Trap: Coordination failures in competitive environments
  • Warning Signs: Indicators of dangerous capability-alignment gaps
  • Takeoff Dynamics: Speed of AI development and adaptation time

The model also informs key debates:

  • Pause vs. Proceed: Whether to slow capability development
  • Open vs. Closed: Model release policies and proliferation speed
  • Regulation Approaches: Government responses to the race dynamic

Sources & Resources

Academic Papers & Research

StudyKey FindingCitation
Scaling LawsCompute-capability relationshipKaplan et al. (2020)
Alignment Tax AnalysisSafety overhead quantificationKenton et al. (2021)
Governance Lag StudyPolicy adaptation timelines[D

Related Pages

Top Related Pages

People

Paul ChristianoHolden Karnofsky

Labs

AnthropicConjecture

Risks

Deceptive Alignment

Approaches

Scheming & Deception DetectionSparse Autoencoders (SAEs)

Analysis

AGI Development

Safety Research

Anthropic Core Views

Models

AI Acceleration Tradeoff ModelAI Safety Research Value Model

Organizations

Epoch AIRedwood Research

Concepts

AnthropicMachine Intelligence Research Institute

Key Debates

AI Alignment Research AgendasTechnical AI Safety Research

Historical

Deep Learning Revolution EraMainstream Era

Transition Model

Interpretability Coverage