Longterm Wiki
Updated 2025-12-26HistoryData
Page StatusContent
Edited 7 weeks ago1.3k words1 backlinks
60
QualityGood
75
ImportanceHigh
11
Structure11/15
1413700%13%
Updated quarterlyDue in 6 weeks
Summary

Economic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~$500M/year to $2-5B, with highest marginal returns (5-10x) in alignment theory and governance research currently receiving only 10% of funding each. Provides specific allocation recommendations across philanthropic ($600M-1B), industry ($600M), and government ($1B) sources with concrete investment priorities and timelines.

TODOs4
Complete 'Conceptual Framework' section
Complete 'Quantitative Analysis' section (8 placeholders)
Complete 'Strategic Importance' section
Complete 'Limitations' section (6 placeholders)

Expected Value of AI Safety Research

Model

AI Safety Research Value Model

Economic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~$500M/year to $2-5B, with highest marginal returns (5-10x) in alignment theory and governance research currently receiving only 10% of funding each. Provides specific allocation recommendations across philanthropic ($600M-1B), industry ($600M), and government ($1B) sources with concrete investment priorities and timelines.

Model TypeCost-Effectiveness Analysis
ScopeSafety Research ROI
Key InsightSafety research value depends critically on timing relative to capability progress
1.3k words · 1 backlinks
Model

AI Safety Research Value Model

Economic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~$500M/year to $2-5B, with highest marginal returns (5-10x) in alignment theory and governance research currently receiving only 10% of funding each. Provides specific allocation recommendations across philanthropic ($600M-1B), industry ($600M), and government ($1B) sources with concrete investment priorities and timelines.

Model TypeCost-Effectiveness Analysis
ScopeSafety Research ROI
Key InsightSafety research value depends critically on timing relative to capability progress
1.3k words · 1 backlinks

Overview

This economic model quantifies the expected value of marginal investments in AI safety research. Current global spending of ≈$100M annually on safety research appears significantly below optimal levels, with analysis suggesting 2-5x returns available in neglected areas.

Key findings: Safety research could reduce AI catastrophic risk by 20-40% over the next decade, with particularly high returns in alignment theory and governance research. Current 100:1 ratio of capabilities to safety spending creates systematic underinvestment in risk mitigation.

The model incorporates deep uncertainty about AI risk probabilities (1-20% existential risk this century), tractability of safety problems, and optimal resource allocation across different research approaches.

Risk/Impact Assessment

FactorAssessmentEvidenceSource
Current UnderinvestmentHigh100:1 capabilities vs safety ratioEpoch AI (2024)
Marginal ReturnsMedium-High2-5x potential in neglected areasCoefficient Giving
Timeline SensitivityHighValue drops 50%+ if timelines <5 yearsAI Impacts Survey
Research Direction RiskMedium10-100x variance between approachesAnalysis based on expert interviews

Strategic Framework

Core Expected Value Equation

EV = P(AI catastrophe) × R(research impact) × V(prevented harm) - C(research costs)

Where:
- P ∈ [0.01, 0.20]: Probability of catastrophic AI outcome
- R ∈ [0.05, 0.40]: Fractional risk reduction from research
- V ≈ \$10¹⁵-10¹⁷: Value of prevented catastrophic harm
- C ≈ \$10⁹: Annual research investment

Investment Priority Matrix

Research AreaCurrent Annual FundingMarginal ReturnsEvidence Quality
Alignment Theory$50MHigh (5-10x)Low
Interpretability$175MMedium (2-3x)Medium
Evaluations$100MHigh (3-5x)High
Governance Research$50MHigh (4-8x)Medium
RLHF/Fine-tuning$125MLow (1-2x)High

Source: Author estimates based on Anthropic, OpenAI, DeepMind public reporting

Resource Allocation Analysis

Current vs. Optimal Distribution

Loading diagram...

Recommended Reallocation

AreaCurrent ShareRecommendedChangeRationale
Alignment Theory10%20%+50MHigh theoretical returns, underinvested
Governance Research10%15%+25MPolicy leverage, regulatory preparation
Evaluations20%25%+25MNear-term safety, measurable progress
Interpretability35%30%-25MWell-funded, diminishing returns
RLHF/Fine-tuning25%10%-75MMay accelerate capabilities

Actor-Specific Investment Strategies

Philanthropic Funders ($200M/year current)

Recommended increase: 3-5x to $600M-1B/year

PriorityInvestmentExpected ReturnTimeline
Talent pipeline$100M/year3-10x over 5 yearsLong-term
Exploratory research$200M/yearHigh varianceMedium-term
Policy research$100M/yearHigh if timelines shortNear-term
Field building$50M/yearNetwork effectsLong-term

Key organizations: Coefficient Giving, Future of Humanity Institute, Long-Term Future Fund

AI Labs ($300M/year current)

Recommended increase: 2x to $600M/year

  • Internal safety teams: Expand from 5-10% to 15-20% of research staff
  • External collaboration: Fund academic partnerships, open source safety tools
  • Evaluation infrastructure: Invest in red-teaming, safety benchmarks

Analysis of Anthropic, OpenAI, DeepMind public commitments

Government Funding ($100M/year current)

Recommended increase: 10x to $1B/year

AgencyCurrentRecommendedFocus Area
NSF$20M$200MBasic research, academic capacity
NIST$30M$300MStandards, evaluation frameworks
DARPA$50M$500MHigh-risk research, novel approaches

Comparative Investment Analysis

Returns vs. Other Interventions

InterventionCost per QALYProbability AdjustmentAdjusted Cost
AI Safety (optimistic)$0.01P(success) = 0.3$0.03
AI Safety (pessimistic)$1,000P(success) = 0.1$10,000
Global health (GiveWell)$100P(success) = 0.9$111
Climate change mitigation$50-500P(success) = 0.7$71-714

QALY = Quality-Adjusted Life Year. Analysis based on GiveWell methodology

Risk-Adjusted Portfolio

Risk ToleranceAI Safety AllocationOther Cause AreasRationale
Risk-neutral80-90%10-20%Expected value dominance
Risk-averse40-60%40-60%Hedge against model uncertainty
Very risk-averse20-30%70-80%Prefer proven interventions

Current State & Trajectory

2024 Funding Landscape

Total AI safety funding: ≈$500-700M globally

SourceAmountGrowth RateKey Players
Tech companies$300M+50%/yearAnthropic, OpenAI, DeepMind
Philanthropy$200M+30%/yearCoefficient Giving, FTX regrants
Government$100M+100%/yearNIST, UK AISI, EU
Academia$50M+20%/yearStanford HAI, MIT, Berkeley

2025-2030 Projections

Scenario: Moderate scaling

  • Total funding grows to $2-5B by 2030
  • Government share increases from 15% to 40%
  • Industry maintains 50-60% share

Bottlenecks limiting growth:

  1. Talent pipeline: ~1,000 qualified researchers globally
  2. Research direction clarity: Uncertainty about most valuable approaches
  3. Access to frontier models: Safety research requires cutting-edge systems

Source: Future of Humanity Institute talent survey, author projections

Key Uncertainties & Research Cruxes

Fundamental Disagreements

DimensionOptimistic ViewPessimistic ViewCurrent Evidence
AI Risk Level2-5% x-risk probability15-20% x-risk probabilityExpert surveys show 5-10% median
Alignment TractabilitySolvable with sufficient researchFundamentally intractableMixed signals from early work
Timeline SensitivityDecades to solve problemsNeed solutions in 3-7 yearsAcceleration in capabilities suggests shorter timelines
Research TransferabilityInsights transfer across architecturesApproach-specific solutionsLimited evidence either way

Critical Research Questions

Empirical questions that would change investment priorities:

  1. Interpretability scaling: Do current techniques work on 100B+ parameter models?
  2. Alignment tax: What performance cost do safety measures impose?
  3. Adversarial robustness: Can safety measures withstand optimization pressure?
  4. Governance effectiveness: Do AI safety standards actually get implemented?

Information Value Estimates

Value of resolving key uncertainties:

QuestionValue of InformationTimeline to Resolution
Alignment difficulty$1-10B3-7 years
Interpretability scaling$500M-5B2-5 years
Governance effectiveness$100M-1B5-10 years
Risk probability$10-100BUncertain

Implementation Roadmap

2025-2026: Foundation Building

Year 1 Priorities ($1B investment)

  • Talent: 50% increase in safety researchers through fellowships, PhD programs
  • Infrastructure: Safety evaluation platforms, model access protocols
  • Research: Focus on near-term measurable progress

2027-2029: Scaling Phase

Years 2-4 Priorities ($2-3B/year)

  • International coordination on safety research standards
  • Large-scale alignment experiments on frontier models
  • Policy research integration with regulatory development

2030+: Deployment Phase

Long-term integration

  • Safety research embedded in all major AI development
  • International safety research collaboration infrastructure
  • Automated safety evaluation and monitoring systems

Sources & Resources

Academic Literature

PaperKey FindingRelevance
Ord (2020)10% x-risk this centuryRisk probability estimates
Amodei et al. (2016)Safety research agendaResearch direction framework
Russell (2019)Control problem formulationAlignment problem definition
Christiano (2018)IDA proposalSpecific alignment approach

Research Organizations

OrganizationFocusAnnual BudgetKey Publications
AnthropicConstitutional AI, interpretability$100M+Constitutional AI paper
MIRIAgent foundations$5MLogical induction
CHAIHuman-compatible AI$10MCIRL framework
ARCAlignment research$15MEliciting latent knowledge

Policy Resources

SourceTypeKey Insights
NIST AI Risk Management FrameworkStandardsRisk assessment methodology
UK AI Safety InstituteGovernment researchEvaluation frameworks
EU AI ActRegulationCompliance requirements
RAND AI StrategyAnalysisMilitary AI implications

Funding Sources

FunderFocus AreaAnnual AI SafetyApplication Process
Coefficient GivingTechnical research, policy$100M+LOI system
Future FundLongtermism, x-risk$50M+Grant applications
NSFAcademic research$20MStandard grants
Survival and Flourishing FundExistential risk$10MQuarterly rounds

Related Pages

Top Related Pages

Analysis

Anthropic Founder Pledges: Interventions to Increase Follow-ThroughCapability-Alignment Race Model

Labs

Center for AI Safety

Models

AI Risk Activation Timeline ModelAI Risk Portfolio AnalysisWorldview-Intervention Mapping

Concepts

Epoch AIAI ImpactsOpen Philanthropy