Skip to content
Longterm Wiki
Navigation
Updated 2025-12-26HistoryData
Page StatusContent
Edited 3 months ago1.4k words4 backlinksUpdated quarterlyOverdue by 10 days
60QualityGood •53.5ImportanceUseful59.5ResearchModerate
Content8/13
SummaryScheduleEntityEdit historyOverview
Tables14/ ~6Diagrams1/ ~1Int. links41/ ~11Ext. links0/ ~7Footnotes0/ ~4References28/ ~4Quotes0Accuracy0RatingsN:4 R:3.5 A:8 C:7Backlinks4
Issues2
QualityRated 60 but structure suggests 80 (underrated by 20 points)
StaleLast edited 100 days ago - may need review
TODOs4
Complete 'Conceptual Framework' section
Complete 'Quantitative Analysis' section (8 placeholders)
Complete 'Strategic Importance' section
Complete 'Limitations' section (6 placeholders)

Expected Value of AI Safety Research

Analysis

AI Safety Research Value Model

Economic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~$500M/year to $2-5B, with highest marginal returns (5-10x) in alignment theory and governance research currently receiving only 10% of funding each. Provides specific allocation recommendations across philanthropic ($600M-1B), industry ($600M), and government ($1B) sources with concrete investment priorities and timelines.

Model TypeCost-Effectiveness Analysis
ScopeSafety Research ROI
Key InsightSafety research value depends critically on timing relative to capability progress
1.4k words · 4 backlinks

Overview

This economic model quantifies the expected value of marginal investments in AI safety research. Current global spending of ≈$100M annually on safety research appears significantly below optimal levels, with analysis suggesting 2-5x returns available in neglected areas.

Key findings: Safety research could reduce AI catastrophic risk by 20-40% over the next decade, with particularly high returns in alignment theory and governance research. Current 100:1 ratio of capabilities to safety spending creates systematic underinvestment in risk mitigation.

The model incorporates deep uncertainty about AI risk probabilities (1-20% existential risk this century), tractability of safety problems, and optimal resource allocation across different research approaches.

Risk/Impact Assessment

FactorAssessmentEvidenceSource
Current UnderinvestmentHigh100:1 capabilities vs safety ratioEpoch AI (2024)
Marginal ReturnsMedium-High2-5x potential in neglected areasCoefficient Giving
Timeline SensitivityHighValue drops 50%+ if timelines <5 yearsAI Impacts Survey
Research Direction RiskMedium10-100x variance between approachesAnalysis based on expert interviews

Strategic Framework

Core Expected Value Equation

EV = P(AI catastrophe) × R(research impact) × V(prevented harm) - C(research costs)

Where:
- P ∈ [0.01, 0.20]: Probability of catastrophic AI outcome
- R ∈ [0.05, 0.40]: Fractional risk reduction from research
- V ≈ \$10¹⁵-10¹⁷: Value of prevented catastrophic harm
- C ≈ \$10⁹: Annual research investment

Investment Priority Matrix

Research AreaCurrent Annual FundingMarginal ReturnsEvidence Quality
Alignment Theory$50MHigh (5-10x)Low
Interpretability$175MMedium (2-3x)Medium
Evaluations$100MHigh (3-5x)High
Governance Research$50MHigh (4-8x)Medium
RLHF/Fine-tuning$125MLow (1-2x)High

Source: Author estimates based on Anthropic, OpenAI, DeepMind public reporting

Resource Allocation Analysis

Current vs. Optimal Distribution

Diagram (loading…)
pie title Current Safety Research Allocation ($500M)
  "Interpretability" : 35
  "RLHF/Fine-tuning" : 25
  "Evaluations" : 20
  "Alignment Theory" : 10
  "Governance Research" : 10
AreaCurrent ShareRecommendedChangeRationale
Alignment Theory10%20%+50MHigh theoretical returns, underinvested
Governance Research10%15%+25MPolicy leverage, regulatory preparation
Evaluations20%25%+25MNear-term safety, measurable progress
Interpretability35%30%-25MWell-funded, diminishing returns
RLHF/Fine-tuning25%10%-75MMay accelerate capabilities

Actor-Specific Investment Strategies

Philanthropic Funders ($200M/year current)

Recommended increase: 3-5x to $600M-1B/year

PriorityInvestmentExpected ReturnTimeline
Talent pipeline$100M/year3-10x over 5 yearsLong-term
Exploratory research$200M/yearHigh varianceMedium-term
Policy research$100M/yearHigh if timelines shortNear-term
Field building$50M/yearNetwork effectsLong-term

Key organizations: Coefficient Giving, Future of Humanity Institute, Long-Term Future Fund

AI Labs ($300M/year current)

Recommended increase: 2x to $600M/year

  • Internal safety teams: Expand from 5-10% to 15-20% of research staff
  • External collaboration: Fund academic partnerships, open source safety tools
  • Evaluation infrastructure: Invest in red-teaming, safety benchmarks

Analysis of Anthropic, OpenAI, DeepMind public commitments

Government Funding ($100M/year current)

Recommended increase: 10x to $1B/year

AgencyCurrentRecommendedFocus Area
NSF$20M$200MBasic research, academic capacity
NIST$30M$300MStandards, evaluation frameworks
DARPA$50M$500MHigh-risk research, novel approaches

Comparative Investment Analysis

Returns vs. Other Interventions

InterventionCost per QALYProbability AdjustmentAdjusted Cost
AI Safety (optimistic)$0.01P(success) = 0.3$0.03
AI Safety (pessimistic)$1,000P(success) = 0.1$10,000
Global health (GiveWell)$100P(success) = 0.9$111
Climate change mitigation$50-500P(success) = 0.7$71-714

QALY = Quality-Adjusted Life Year. Analysis based on GiveWell methodology

Risk-Adjusted Portfolio

Risk ToleranceAI Safety AllocationOther Cause AreasRationale
Risk-neutral80-90%10-20%Expected value dominance
Risk-averse40-60%40-60%Hedge against model uncertainty
Very risk-averse20-30%70-80%Prefer proven interventions

Current State & Trajectory

2024 Funding Landscape

Total AI safety funding: ≈$500-700M globally

SourceAmountGrowth RateKey Players
Tech companies$300M+50%/yearAnthropic, OpenAI, DeepMind
Philanthropy$200M+30%/yearCoefficient Giving, FTX regrants
Government$100M+100%/yearNIST, UK AISI, EU
Academia$50M+20%/yearStanford HAI, MIT, Berkeley

2025-2030 Projections

Scenario: Moderate scaling

  • Total funding grows to $2-5B by 2030
  • Government share increases from 15% to 40%
  • Industry maintains 50-60% share

Bottlenecks limiting growth:

  1. Talent pipeline: ~1,000 qualified researchers globally
  2. Research direction clarity: Uncertainty about most valuable approaches
  3. Access to frontier models: Safety research requires cutting-edge systems

Source: Future of Humanity Institute talent survey, author projections

Key Uncertainties & Research Cruxes

Fundamental Disagreements

DimensionOptimistic ViewPessimistic ViewCurrent Evidence
AI Risk Level2-5% x-risk probability15-20% x-risk probabilityExpert surveys show 5-10% median
Alignment TractabilitySolvable with sufficient researchFundamentally intractableMixed signals from early work
Timeline SensitivityDecades to solve problemsNeed solutions in 3-7 yearsAcceleration in capabilities suggests shorter timelines
Research TransferabilityInsights transfer across architecturesApproach-specific solutionsLimited evidence either way

Critical Research Questions

Empirical questions that would change investment priorities:

  1. Interpretability scaling: Do current techniques work on 100B+ parameter models?
  2. Alignment tax: What performance cost do safety measures impose?
  3. Adversarial robustness: Can safety measures withstand optimization pressure?
  4. Governance effectiveness: Do AI safety standards actually get implemented?

Information Value Estimates

Value of resolving key uncertainties:

QuestionValue of InformationTimeline to Resolution
Alignment difficulty$1-10B3-7 years
Interpretability scaling$500M-5B2-5 years
Governance effectiveness$100M-1B5-10 years
Risk probability$10-100BUncertain

Implementation Roadmap

2025-2026: Foundation Building

Year 1 Priorities ($1B investment)

  • Talent: 50% increase in safety researchers through fellowships, PhD programs
  • Infrastructure: Safety evaluation platforms, model access protocols
  • Research: Focus on near-term measurable progress

2027-2029: Scaling Phase

Years 2-4 Priorities ($2-3B/year)

  • International coordination on safety research standards
  • Large-scale alignment experiments on frontier models
  • Policy research integration with regulatory development

2030+: Deployment Phase

Long-term integration

  • Safety research embedded in all major AI development
  • International safety research collaboration infrastructure
  • Automated safety evaluation and monitoring systems

See Also

  • Pre-TAI Capital Deployment — How $100-300B+ gets allocated across the AI industry before transformative AI
  • Safety Spending at Scale — Analysis of safety budgets as AI labs scale to billions in annual spending
  • Frontier Lab Cost Structure — Breakdown of where frontier lab budgets go (compute, talent, safety, overhead)
  • AI Talent Market Dynamics — Competition for scarce AI researchers and its effect on safety capacity

Sources & Resources

Academic Literature

PaperKey FindingRelevance
Ord (2020)10% x-risk this centuryRisk probability estimates
Amodei et al. (2016)Safety research agendaResearch direction framework
Russell (2019)Control problem formulationAlignment problem definition
Christiano (2018)IDA proposalSpecific alignment approach

Research Organizations

OrganizationFocusAnnual BudgetKey Publications
AnthropicConstitutional AI, interpretability$100M+Constitutional AI paper
MIRIAgent foundations$5MLogical induction
CHAIHuman-compatible AI$10MCIRL framework
ARCAlignment research$15MEliciting latent knowledge

Policy Resources

SourceTypeKey Insights
NIST AI Risk Management FrameworkStandardsRisk assessment methodology
UK AI Safety InstituteGovernment researchEvaluation frameworks
EU AI ActRegulationCompliance requirements
RAND AI StrategyAnalysisMilitary AI implications

Funding Sources

FunderFocus AreaAnnual AI SafetyApplication Process
Coefficient GivingTechnical research, policy$100M+LOI system
Future FundLongtermism, x-risk$50M+Grant applications
NSFAcademic research$20MStandard grants
Survival and Flourishing FundExistential risk$10MQuarterly rounds

References

OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial general intelligence (AGI) benefits all of humanity. The homepage serves as a gateway to their research, products, and policy work spanning capabilities and safety.

★★★★☆

Google DeepMind is a leading AI research laboratory combining the former DeepMind and Google Brain teams, focused on developing advanced AI systems and conducting research across capabilities, safety, and applications. The organization is one of the most influential labs in AI development, working on frontier models including Gemini and publishing widely-cited safety and capabilities research.

★★★★☆

This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action Plan, and Apply AI Strategy. It aims to balance Europe's global AI competitiveness with safety, fundamental rights, and democratic values. Key initiatives include AI Factories, the InvestAI Facility, GenAI4EU, and the Apply AI Alliance.

★★★★☆
4**Future of Humanity Institute**Future of Humanity Institute

The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk research and AI safety. FHI closed on 16 April 2024 after approximately two decades of influential work. The site now serves as an archived record of the institution's history, research agenda, and legacy.

★★★★☆

DARPA is the U.S. Department of Defense's primary research agency focused on creating transformative technologies for national security. The homepage highlights current programs including autonomous systems (RACER mine-clearing), battlefield casualty care (Live Chain), and biosecurity challenges. DARPA funds high-risk, high-reward research across AI, autonomy, biotechnology, and other emerging domains relevant to AI safety and governance.

This page represents Open Philanthropy's technical AI safety research funding hub, now rebranded as Coefficient Giving. The organization has directed over $4 billion in grants since 2014, with a dedicated 'Navigating Transformative AI' fund focused on ensuring AI is safe and well-governed. It serves as a major philanthropic funder for AI safety research and related existential risk work.

★★★★☆

This Epoch AI resource appears to analyze funding trends in machine learning, but the page is no longer accessible at the given URL, returning a 404 error. The content has either been moved or removed from the Epoch AI website.

★★★★☆

The 2022 ESPAI surveyed 738 machine learning researchers (NeurIPS/ICML authors) about AI progress timelines and risks, serving as a replication and update of the 2016 survey. Key findings include an aggregate forecast of 50% chance of HLMI by 2059 (37 years from 2022), with significant disagreement among experts about timelines and risks.

★★★☆☆
9AI ImpactsAI Impacts

AI Impacts is a research organization that investigates empirical questions relevant to AI forecasting and safety, including AI timelines, discontinuous progress risks, and existential risk arguments. It maintains a wiki and blog featuring expert surveys, historical analyses, and structured arguments about transformative AI development. Notable outputs include periodic expert surveys on AI progress timelines.

★★★☆☆
10NSF Funding Opportunitiesnsf.gov·Government

The National Science Foundation (NSF) funding portal provides information on grants, fellowships, and research funding opportunities across scientific disciplines. As a major U.S. federal research funder, NSF supports basic and applied research relevant to AI safety and related fields. The page content was inaccessible due to JavaScript requirements.

11FTX Future Fundftxfuturefund.org

The FTX Future Fund was a major philanthropic initiative backed by FTX and Sam Bankman-Fried, focused on funding projects addressing humanity's most pressing long-term risks, including AI safety, biosecurity, and existential risk reduction. It represented one of the largest EA-aligned grantmaking organizations before FTX's collapse in November 2022 forced the fund to shut down. This is an archived version of its website.

The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while promoting trustworthiness across design, development, deployment, and evaluation. It provides structured guidance organized around core functions and is accompanied by a Playbook, Roadmap, and a Generative AI Profile (2024) addressing risks specific to generative AI systems.

★★★★★

A curated index of DeepMind/Google DeepMind research publications filtered by the 'safety' tag, covering 240 papers spanning topics such as AI consciousness, existential safety, human-AI alignment, AI personhood, and technical safety research. The listing spans multiple years and reflects the breadth of safety-related work coming out of one of the world's leading AI labs.

★★★★☆

This guest post by Ajeya Cotra summarizes Paul Christiano's IDA scheme for training ML systems robustly aligned to complex human values. IDA alternates between amplification (using humans plus AI tools to handle harder tasks) and distillation (training a new AI to imitate that augmented human), iteratively bootstrapping capability while preserving alignment. The approach draws analogies to AlphaGo Zero and expert iteration.

OpenAI's central safety page providing updates on their approach to AI safety research, deployment practices, and ongoing safety commitments. It serves as a hub for information on OpenAI's safety-related initiatives, policies, and technical work aimed at ensuring their AI systems are safe and beneficial.

★★★★☆
16Guidelines and standardsNIST·Government

NIST's AI hub provides foundational guidelines, standards, and governance frameworks for responsible AI development, centered on the AI Risk Management Framework (AI RMF). As a nonregulatory federal agency, NIST promotes trustworthy AI through measurement science, voluntary technical standards, and stakeholder collaboration to balance innovation with risk mitigation.

★★★★★

GiveWell is a nonprofit charity evaluator that researches and recommends highly effective giving opportunities, focusing on evidence-based interventions with strong cost-effectiveness. It conducts in-depth analysis of charities to identify where donations can do the most good, primarily in global health and poverty. GiveWell exemplifies the effective altruism methodology of rigorous expected-value reasoning applied to philanthropic decisions.

18Long-Term Future FundCentre for Effective Altruism

The Long-Term Future Fund is an Effective Altruism-affiliated grantmaking fund focused on improving humanity's prospects over the long run, particularly by supporting work on reducing existential and catastrophic risks. It funds research, advocacy, and capacity-building projects related to AI safety, biosecurity, and other global priorities. The fund is managed by a committee of EA community members and operates on a rolling grants basis.

★★★☆☆

CHAI is a UC Berkeley research center dedicated to reorienting AI development toward systems that are provably beneficial and aligned with human values. It conducts technical and conceptual research on problems including value alignment, corrigibility, and AI safety, and serves as a major hub for academic AI safety work.

20Survival and Flourishing Fundsurvivalandflourishing.fund

SFF is a philanthropic organization that coordinates grant recommendations for existential risk reduction and AI safety work, having distributed over $152 million since 2019. It uses a distinctive 'S-Process' for collaborative grant evaluation among multiple donors and advisors. SFF is a significant funding source for many leading AI safety organizations and researchers.

Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its family of AI assistants, with a stated mission of responsible development and maintenance of advanced AI for long-term human benefit.

★★★★☆

Toby Ord's 'The Precipice' argues that humanity stands at a critical juncture where existential risks—particularly from emerging technologies like AI—could permanently curtail our long-term potential. The book estimates probabilities of various catastrophic risks, makes the case for prioritizing existential risk reduction, and outlines a research and policy agenda for safeguarding humanity's future.

23Concrete Problems in AI SafetyarXiv·Dario Amodei et al.·2016·Paper

This foundational paper by Amodei et al. identifies five practical AI safety research problems: avoiding side effects, avoiding reward hacking, scalable oversight, safe exploration, and robustness to distributional shift. It frames these as concrete technical challenges arising from real-world ML system design, providing a research agenda that has significantly shaped the field of AI safety.

★★★☆☆

RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on AI risks, military applications, and regulatory frameworks from one of the leading U.S. defense and policy think tanks.

★★★★☆

The NSF is the primary U.S. federal agency funding basic research and education across all non-medical fields of science and engineering. It supports a broad portfolio of research including computer science, AI, and emerging technologies critical to national competitiveness. NSF funding decisions shape the direction of academic research relevant to AI safety and alignment.

Open Philanthropy is a major philanthropic organization that funds work across global health, AI safety, biosecurity, and other cause areas. Their grants database provides transparency into which organizations and research directions receive funding. They are one of the largest funders of AI safety and existential risk research.

★★★★☆

Anthropic's research page aggregates their work across AI alignment, mechanistic interpretability, and societal impact assessment, all oriented toward understanding and mitigating risks from increasingly capable AI systems. It serves as a central hub for their published findings and ongoing safety-focused investigations.

★★★★☆
28UK AI Safety Institute (AISI)UK AI Safety Institute·Government

The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, develops evaluation frameworks for frontier AI models, and works with international partners to inform global AI governance and policy.

★★★★☆

Related Wiki Pages

Top Related Pages

Analysis

Frontier Lab Cost StructureAI Talent Market DynamicsAI Risk Portfolio AnalysisAI Risk Activation Timeline ModelAI Compounding Risks Analysis ModelAnthropic Founder Pledges: Interventions to Increase Follow-Through

Organizations

Epoch AICoefficient GivingMachine Intelligence Research InstituteCenter for Human-Compatible AI

Policy

Singapore Consensus on AI Safety Research Priorities