Page StatusContent

Edited 7 weeks ago1.1k words4 backlinks

Updated quarterlyDue in 6 weeks

Summary

Quantifies the capability-alignment race showing capabilities currently ~3 years ahead of alignment readiness, with gap widening at 0.5 years/year driven by 10²⁶ FLOP scaling vs. 15% interpretability coverage and 30% scalable oversight maturity. Projects gap reaching 5-7 years by 2030 unless alignment research funding increases from $200M to $800M annually, with 60% chance of warning shot before TAI potentially triggering governance response.

TODOs4

Complete 'Conceptual Framework' section

Complete 'Quantitative Analysis' section (8 placeholders)

Complete 'Strategic Importance' section

Complete 'Limitations' section (6 placeholders)

Capability-Alignment Race Model

Analysis

Capability-Alignment Race Model

Safety Agendas

Organizations

People

Concepts

1.1k words · 4 backlinks

Overview

The Capability-Alignment Race Model quantifies the fundamental dynamic determining AI safety: the gap between advancing capabilities and our readiness to safely deploy them. Current analysis shows capabilities ~3 years ahead of alignment readiness, with this gap widening at 0.5 years annually.

The model tracks how frontier compute (currently 10²⁶ FLOP for largest training runs) and algorithmic improvements drive capability progress at ~10-15 percentage points per year, while alignment research (interpretability at ~15% behavior coverage—though less than 5% of frontier model computations are mechanistically understood—and scalable oversight at ~30% maturity) advances more slowly. This creates deployment pressure worth $100B annually, racing against governance systems operating at ~25% effectiveness.

List View

Computing layout...

React Flow

Node Types

Leaf Nodes

Causes

Intermediate

Effects

Arrow Strength

Strong

Medium

Weak

Risk Assessment

Factor	Severity	Likelihood	Timeline	Trend
Gap widens to 5+ years	Catastrophic	50%	2027-2030	Accelerating
Alignment breakthroughs	Critical (positive)	20%	2025-2027	Uncertain
Governance catches up	High (positive)	25%	2026-2028	Slow
Warning shots trigger response	Medium (positive)	60%	2025-2027	Increasing

Key Dynamics & Evidence

Capability Acceleration

Component	Current State	Growth Rate	2027 Projection	Source
Training compute	10²⁶ FLOP	4x/year	10²⁸ FLOP	Epoch AI↗
Algorithmic efficiency	2x 2024 baseline	1.5x/year	3.4x baseline	Erdil & Besiroglu (2023)↗
Performance (MMLU)	89%	+8pp/year	>95%	Anthropic↗
Frontier lab lead	6 months	Stable	3-6 months	RAND↗

Alignment Lag

Component	Current Coverage	Improvement Rate	2027 Projection	Critical Gap
Interpretability (behavior coverage)	15%	+5pp/year	30%	Need 80% for safety
Scalable oversight	30%	+8pp/year	54%	Need 90% for superhuman
Deception detection	20%	+3pp/year	29%	Need 95% for AGI
Alignment tax	15% loss	-2pp/year	9% loss	Target <5% for adoption

Deployment Pressure

Economic value drives rapid deployment, creating misalignment between safety needs and market incentives.

Pressure Source	Current Impact	Annual Growth	2027 Impact	Mitigation
Economic value	$500B/year	40%	$1.5T/year	Regulation, liability
Military competition	0.6/1.0 intensity	Increasing	0.8/1.0	Arms control treaties
Lab competition	6 month lead	Shortening	3 month lead	Industry coordination

Quote from Paul Christiano↗: "The core challenge is that capabilities are advancing faster than our ability to align them. If this gap continues to widen, we'll be in serious trouble."

Current State & Trajectory

2025 Snapshot

The race is in a critical phase with capabilities accelerating faster than alignment solutions:

Frontier models approaching human-level performance (70% expert-level)
Alignment research still in early stages with limited coverage
Governance systems lagging significantly behind technical progress
Economic incentives strongly favor rapid deployment over safety

5-Year Projections

Metric	Current	2027	2030	Risk Level
Capability-alignment gap	3 years	4-5 years	5-7 years	Critical
Deployment pressure	0.7/1.0	0.85/1.0	0.9/1.0	High
Governance strength	0.25/1.0	0.4/1.0	0.6/1.0	Improving
Warning shot probability	15%/year	20%/year	25%/year	Increasing

Based on Metaculus forecasts↗ and expert surveys from AI Impacts↗.

Potential Turning Points

Critical junctures that could alter trajectories:

Major alignment breakthrough (20% chance by 2027): Interpretability or oversight advance that halves the gap
Capability plateau (15% chance): Scaling laws break down, slowing capability progress
Coordinated pause (10% chance): International agreement to pause frontier development
Warning shot incident (60% chance by 2027): Serious but recoverable AI accident that triggers policy response

Key Uncertainties & Research Cruxes

Technical Uncertainties

Question	Current Evidence	Expert Consensus	Implications
Can interpretability scale to frontier models?	Limited success on smaller models	45% optimistic	Determines alignment feasibility
Will scaling laws continue?	Some evidence of slowdown	70% continue to 2027	Core driver of capability timeline
How much alignment tax is acceptable?	Currently 15%	Target <5%	Adoption vs. safety tradeoff

Governance Questions

Regulatory capture: Will AI labs co-opt government oversight? CNAS analysis↗ suggests 40% risk
International coordination: Can major powers cooperate on AI safety? RAND assessment↗ shows limited progress
Democratic response: Will public concern drive effective policy? Polling shows growing awareness↗ but uncertain translation to action

Strategic Cruxes

Core disagreements among experts on alignment difficulty:

Technical optimism: 35% believe alignment will prove tractable
Governance solution: 25% think coordination/pause is the path forward
Warning shots help: 60% expect helpful wake-up calls before catastrophe
Timeline matters: 80% agree slower development improves outcomes

Timeline of Critical Events

Period	Capability Milestones	Alignment Progress	Governance Developments
2025	GPT-5 level, 80% human tasks	Basic interpretability tools	EU AI Act implementation
2026	Multimodal AGI claims	Scalable oversight demos	US federal AI legislation
2027	Superhuman in most domains	Alignment tax <10%	International AI treaty
2028	Recursive self-improvement	Deception detection tools	Compute governance regime
2030	Transformative AI deployment	Mature alignment stack	Global coordination framework

Based on Metaculus community predictions↗ and Future of Humanity Institute surveys↗.

Resource Requirements & Strategic Investments

Priority Funding Areas

Analysis suggests optimal resource allocation to narrow the gap:

Investment Area	Current Funding	Recommended	Gap Reduction	ROI
Alignment research	$200M/year	$800M/year	0.8 years	High
Interpretability	$50M/year	$300M/year	0.3 years	Very high
Governance capacity	$100M/year	$400M/year	Indirect (time)	Medium
Coordination/pause	$30M/year	$200M/year	Variable	High if successful

Key Organizations & Initiatives

Leading efforts to address the capability-alignment gap:

Organization	Focus	Annual Budget	Approach
Anthropic	Constitutional AI	$500M	Constitutional training
DeepMind	Alignment team	$100M	Scalable oversight
MIRI	Agent foundations	$15M	Theoretical foundations
ARC	Alignment research	$20M	Empirical alignment

Related Models & Cross-References

This model connects to several other risk analyses:

Racing Dynamics: How competition accelerates capability development
Multipolar Trap: Coordination failures in competitive environments
Warning Signs: Indicators of dangerous capability-alignment gaps
Takeoff Dynamics: Speed of AI development and adaptation time

The model also informs key debates:

Pause vs. Proceed: Whether to slow capability development
Open vs. Closed: Model release policies and proliferation speed
Regulation Approaches: Government responses to the race dynamic

Sources & Resources

Academic Papers & Research

Study	Key Finding	Citation
Scaling Laws	Compute-capability relationship	Kaplan et al. (2020)↗
Alignment Tax Analysis	Safety overhead quantification	Kenton et al. (2021)↗
Governance Lag Study	Policy adaptation timelines	[D

Capability-Alignment Race Model

Capability-Alignment Race Model

Overview

Risk Assessment

Key Dynamics & Evidence

Capability Acceleration

Alignment Lag

Deployment Pressure

Current State & Trajectory

2025 Snapshot

5-Year Projections

Potential Turning Points

Key Uncertainties & Research Cruxes

Technical Uncertainties

Governance Questions

Strategic Cruxes

Timeline of Critical Events

Resource Requirements & Strategic Investments

Priority Funding Areas

Key Organizations & Initiatives

Related Models & Cross-References

Sources & Resources

Academic Papers & Research

Related Pages

Top Related Pages

AI Development Racing Dynamics

Scalable Oversight

AI Safety Technical Pathway Decomposition

AI Risk Feedback Loop & Cascade Model

AI Safety Multi-Actor Strategic Landscape

People

Labs

Risks

Approaches

Analysis

Safety Research

Models

Organizations

Concepts

Key Debates

Historical

Transition Model