Expected Value of AI Safety Research
AI Safety Research Value Model
Economic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~$500M/year to $2-5B, with highest marginal returns (5-10x) in alignment theory and governance research currently receiving only 10% of funding each. Provides specific allocation recommendations across philanthropic ($600M-1B), industry ($600M), and government ($1B) sources with concrete investment priorities and timelines.
Overview
This economic model quantifies the expected value of marginal investments in AI safety research. Current global spending of ≈$100M annually on safety research appears significantly below optimal levels, with analysis suggesting 2-5x returns available in neglected areas.
Key findings: Safety research could reduce AI catastrophic risk by 20-40% over the next decade, with particularly high returns in alignment theory and governance research. Current 100:1 ratio of capabilities to safety spending creates systematic underinvestment in risk mitigation.
The model incorporates deep uncertainty about AI risk probabilities (1-20% existential risk this century), tractability of safety problems, and optimal resource allocation across different research approaches.
Risk/Impact Assessment
| Factor | Assessment | Evidence | Source |
|---|---|---|---|
| Current Underinvestment | High | 100:1 capabilities vs safety ratio | Epoch AI (2024)↗🔗 web★★★★☆Epoch AITrends in Machine Learning Funding (Epoch AI, 2024)This URL returns a 404 page-not-found error; the original Epoch AI analysis on ML funding trends is no longer accessible at this location and may need to be updated or removed from the knowledge base.This Epoch AI resource appears to analyze funding trends in machine learning, but the page is no longer accessible at the given URL, returning a 404 error. The content has eithe...capabilitiescomputeai-safetygovernance+1Source ↗ |
| Marginal Returns | Medium-High | 2-5x potential in neglected areas | Coefficient Giving↗🔗 web★★★★☆Coefficient GivingOpen Philanthropy (now Coefficient Giving) - Technical AI Safety FundingOpen Philanthropy (now Coefficient Giving) is one of the largest funders of AI safety research; this page is the entry point for their technical AI safety funding priorities and grant history, relevant for understanding the broader AI safety funding landscape.This page represents Open Philanthropy's technical AI safety research funding hub, now rebranded as Coefficient Giving. The organization has directed over $4 billion in grants s...ai-safetygovernanceexistential-riskcoordination+3Source ↗ |
| Timeline Sensitivity | High | Value drops 50%+ if timelines <5 years | AI Impacts Survey↗🔗 web★★★☆☆AI ImpactsAI experts show significant disagreementThis is the primary source page for the 2022 ESPAI survey by AI Impacts; note the page is outdated and links to an updated wiki version with fuller results, making it a key empirical reference for AI timeline and risk forecasting discussions.The 2022 ESPAI surveyed 738 machine learning researchers (NeurIPS/ICML authors) about AI progress timelines and risks, serving as a replication and update of the 2016 survey. Ke...capabilitiesevaluationai-safetyexistential-risk+3Source ↗ |
| Research Direction Risk | Medium | 10-100x variance between approaches | Analysis based on expert interviews |
Strategic Framework
Core Expected Value Equation
EV = P(AI catastrophe) × R(research impact) × V(prevented harm) - C(research costs)
Where:
- P ∈ [0.01, 0.20]: Probability of catastrophic AI outcome
- R ∈ [0.05, 0.40]: Fractional risk reduction from research
- V ≈ \$10¹⁵-10¹⁷: Value of prevented catastrophic harm
- C ≈ \$10⁹: Annual research investment
Investment Priority Matrix
| Research Area | Current Annual Funding | Marginal Returns | Evidence Quality |
|---|---|---|---|
| Alignment Theory | $50M | High (5-10x) | Low |
| Interpretability | $175M | Medium (2-3x) | Medium |
| Evaluations | $100M | High (3-5x) | High |
| Governance Research | $50M | High (4-8x) | Medium |
| RLHF/Fine-tuning | $125M | Low (1-2x) | High |
Source: Author estimates based on Anthropic↗📄 paper★★★★☆AnthropicAnthropic's Work on AI SafetyThis is Anthropic's research landing page, useful as a starting point for discovering their published work on safety and alignment, but not a standalone paper or primary source in itself.Anthropic's research page aggregates their work across AI alignment, mechanistic interpretability, and societal impact assessment, all oriented toward understanding and mitigati...ai-safetyalignmentinterpretabilitytechnical-safety+4Source ↗, OpenAI↗🔗 web★★★★☆OpenAIOpenAI Safety UpdatesOpenAI's official safety landing page; useful for tracking the organization's stated safety priorities and initiatives, though it represents the company's public-facing position rather than independent analysis.OpenAI's central safety page providing updates on their approach to AI safety research, deployment practices, and ongoing safety commitments. It serves as a hub for information ...ai-safetyalignmentgovernancedeployment+4Source ↗, DeepMind↗🔗 web★★★★☆Google DeepMindDeepMind Safety Research PublicationsThis is DeepMind's official publications page filtered by 'safety', serving as a regularly updated index of their safety-relevant research output; useful for tracking the lab's priorities and finding primary sources on specific safety topics.A curated index of DeepMind/Google DeepMind research publications filtered by the 'safety' tag, covering 240 papers spanning topics such as AI consciousness, existential safety,...ai-safetyalignmentexistential-risktechnical-safety+4Source ↗ public reporting
Resource Allocation Analysis
Current vs. Optimal Distribution
Diagram (loading…)
pie title Current Safety Research Allocation ($500M) "Interpretability" : 35 "RLHF/Fine-tuning" : 25 "Evaluations" : 20 "Alignment Theory" : 10 "Governance Research" : 10
Recommended Reallocation
| Area | Current Share | Recommended | Change | Rationale |
|---|---|---|---|---|
| Alignment Theory | 10% | 20% | +50M | High theoretical returns, underinvested |
| Governance Research | 10% | 15% | +25M | Policy leverage, regulatory preparation |
| Evaluations | 20% | 25% | +25M | Near-term safety, measurable progress |
| Interpretability | 35% | 30% | -25M | Well-funded, diminishing returns |
| RLHF/Fine-tuning | 25% | 10% | -75M | May accelerate capabilities |
Actor-Specific Investment Strategies
Philanthropic Funders ($200M/year current)
Recommended increase: 3-5x to $600M-1B/year
| Priority | Investment | Expected Return | Timeline |
|---|---|---|---|
| Talent pipeline | $100M/year | 3-10x over 5 years | Long-term |
| Exploratory research | $200M/year | High variance | Medium-term |
| Policy research | $100M/year | High if timelines short | Near-term |
| Field building | $50M/year | Network effects | Long-term |
Key organizations: Coefficient Giving↗🔗 web★★★★☆Coefficient GivingOpen Philanthropy grants databaseOpen Philanthropy is one of the most influential funders in AI safety; their grants database is a useful reference for understanding which organizations and research directions receive major philanthropic support.Open Philanthropy is a major philanthropic organization that funds work across global health, AI safety, biosecurity, and other cause areas. Their grants database provides trans...ai-safetyexistential-riskgovernancecoordination+3Source ↗, Future of Humanity Institute↗🔗 web★★★★☆Future of Humanity Institute**Future of Humanity Institute**FHI was a pioneering institution in AI safety and existential risk; this archived homepage is useful for historical context and understanding the institutional origins of the field, though the site is no longer actively updated following its April 2024 closure.The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk researc...ai-safetyexistential-riskalignmentgovernance+3Source ↗, Long-Term Future Fund↗🔗 web★★★☆☆Centre for Effective AltruismLong-Term Future FundThe Long-Term Future Fund is a key funding source for early-career AI safety researchers and independent projects; reviewing its grant history offers insight into what the EA community considers high-priority work in existential risk reduction.The Long-Term Future Fund is an Effective Altruism-affiliated grantmaking fund focused on improving humanity's prospects over the long run, particularly by supporting work on re...existential-riskai-safetygovernancecoordination+6Source ↗
AI Labs ($300M/year current)
Recommended increase: 2x to $600M/year
- Internal safety teams: Expand from 5-10% to 15-20% of research staff
- External collaboration: Fund academic partnerships, open source safety tools
- Evaluation infrastructure: Invest in red-teaming, safety benchmarks
Analysis of Anthropic↗🔗 web★★★★☆AnthropicAnthropic - AI Safety Company HomepageAnthropic is a primary institutional actor in AI safety; understanding their research agenda and deployment philosophy is relevant context for the broader AI safety ecosystem, though this homepage itself is a reference point rather than a primary technical resource.Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its famil...ai-safetyalignmentcapabilitiesinterpretability+6Source ↗, OpenAI↗🔗 web★★★★☆OpenAIOpenAI Official HomepageOpenAI is a central organization in the AI safety and capabilities landscape; this homepage links to their models, research publications, and policy positions, making it a useful reference point for tracking frontier AI development.OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial g...capabilitiesalignmentgovernancedeployment+5Source ↗, DeepMind↗🔗 web★★★★☆Google DeepMindGoogle DeepMind Official HomepageGoogle DeepMind is a major frontier AI lab whose research and policies are highly relevant to AI safety; this homepage provides entry point to their publications, safety frameworks, and organizational positions on AI risk.Google DeepMind is a leading AI research laboratory combining the former DeepMind and Google Brain teams, focused on developing advanced AI systems and conducting research acros...capabilitiesai-safetygovernancealignment+4Source ↗ public commitments
Government Funding ($100M/year current)
Recommended increase: 10x to $1B/year
| Agency | Current | Recommended | Focus Area |
|---|---|---|---|
| NSF↗🏛️ governmentNSF - U.S. National Science FoundationNSF is relevant to AI safety primarily as a major funder of academic AI research and a policy actor shaping U.S. research priorities; useful reference for understanding public funding landscape.The NSF is the primary U.S. federal agency funding basic research and education across all non-medical fields of science and engineering. It supports a broad portfolio of resear...governancepolicyresearch-prioritiesai-safety+2Source ↗ | $20M | $200M | Basic research, academic capacity |
| NIST↗🏛️ government★★★★★NISTGuidelines and standardsNIST's AI hub is a key U.S. government reference for AI safety governance; its AI RMF is frequently cited in policy discussions and industry compliance efforts as a practical risk management standard.NIST's AI hub provides foundational guidelines, standards, and governance frameworks for responsible AI development, centered on the AI Risk Management Framework (AI RMF). As a ...governancepolicyai-safetyevaluation+3Source ↗ | $30M | $300M | Standards, evaluation frameworks |
| DARPA↗🔗 webDARPA (Defense Advanced Research Projects Agency) HomepageDARPA is a key institutional actor in AI and autonomous systems development; relevant for understanding U.S. government-funded AI capabilities research, military AI deployment, and the governance landscape surrounding dual-use technologies.DARPA is the U.S. Department of Defense's primary research agency focused on creating transformative technologies for national security. The homepage highlights current programs...governancecapabilitiespolicyai-safety+3Source ↗ | $50M | $500M | High-risk research, novel approaches |
Comparative Investment Analysis
Returns vs. Other Interventions
| Intervention | Cost per QALY | Probability Adjustment | Adjusted Cost |
|---|---|---|---|
| AI Safety (optimistic) | $0.01 | P(success) = 0.3 | $0.03 |
| AI Safety (pessimistic) | $1,000 | P(success) = 0.1 | $10,000 |
| Global health (GiveWell) | $100 | P(success) = 0.9 | $111 |
| Climate change mitigation | $50-500 | P(success) = 0.7 | $71-714 |
QALY = Quality-Adjusted Life Year. Analysis based on GiveWell↗🔗 webGiveWell - Evidence-Based Charity EvaluatorGiveWell is a leading EA-adjacent charity evaluator focused on near-term global health causes; relevant to AI safety funding debates as a contrast to long-termist cause prioritization, but does not directly evaluate AI safety organizations.GiveWell is a nonprofit charity evaluator that researches and recommends highly effective giving opportunities, focusing on evidence-based interventions with strong cost-effecti...effective-altruismcost-effectivenessexpected-valueresearch-priorities+3Source ↗ methodology
Risk-Adjusted Portfolio
| Risk Tolerance | AI Safety Allocation | Other Cause Areas | Rationale |
|---|---|---|---|
| Risk-neutral | 80-90% | 10-20% | Expected value dominance |
| Risk-averse | 40-60% | 40-60% | Hedge against model uncertainty |
| Very risk-averse | 20-30% | 70-80% | Prefer proven interventions |
Current State & Trajectory
2024 Funding Landscape
Total AI safety funding: ≈$500-700M globally
| Source | Amount | Growth Rate | Key Players |
|---|---|---|---|
| Tech companies | $300M | +50%/year | Anthropic, OpenAI, DeepMind |
| Philanthropy | $200M | +30%/year | Coefficient Giving, FTX regrants |
| Government | $100M | +100%/year | NIST, UK AISI, EU |
| Academia | $50M | +20%/year | Stanford HAI, MIT, Berkeley |
2025-2030 Projections
Scenario: Moderate scaling
- Total funding grows to $2-5B by 2030
- Government share increases from 15% to 40%
- Industry maintains 50-60% share
Bottlenecks limiting growth:
- Talent pipeline: ~1,000 qualified researchers globally
- Research direction clarity: Uncertainty about most valuable approaches
- Access to frontier models: Safety research requires cutting-edge systems
Source: Future of Humanity Institute↗🔗 web★★★★☆Future of Humanity Institute**Future of Humanity Institute**FHI was a pioneering institution in AI safety and existential risk; this archived homepage is useful for historical context and understanding the institutional origins of the field, though the site is no longer actively updated following its April 2024 closure.The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk researc...ai-safetyexistential-riskalignmentgovernance+3Source ↗ talent survey, author projections
Key Uncertainties & Research Cruxes
Fundamental Disagreements
| Dimension | Optimistic View | Pessimistic View | Current Evidence |
|---|---|---|---|
| AI Risk Level | 2-5% x-risk probability | 15-20% x-risk probability | Expert surveys↗🔗 web★★★☆☆AI ImpactsAI ImpactsAI Impacts is a key empirical research hub for AI safety; its expert surveys and wiki pages are frequently cited in discussions about AI timelines, risk probability, and strategic forecasting within the broader AI safety community.AI Impacts is a research organization that investigates empirical questions relevant to AI forecasting and safety, including AI timelines, discontinuous progress risks, and exis...ai-safetyexistential-riskcapabilitiesevaluation+3Source ↗ show 5-10% median |
| Alignment Tractability | Solvable with sufficient research | Fundamentally intractable | Mixed signals from early work |
| Timeline Sensitivity | Decades to solve problems | Need solutions in 3-7 years | Acceleration in capabilities suggests shorter timelines |
| Research Transferability | Insights transfer across architectures | Approach-specific solutions | Limited evidence either way |
Critical Research Questions
Empirical questions that would change investment priorities:
- Interpretability scaling: Do current techniques work on 100B+ parameter models?
- Alignment tax: What performance cost do safety measures impose?
- Adversarial robustness: Can safety measures withstand optimization pressure?
- Governance effectiveness: Do AI safety standards actually get implemented?
Information Value Estimates
Value of resolving key uncertainties:
| Question | Value of Information | Timeline to Resolution |
|---|---|---|
| Alignment difficulty | $1-10B | 3-7 years |
| Interpretability scaling | $500M-5B | 2-5 years |
| Governance effectiveness | $100M-1B | 5-10 years |
| Risk probability | $10-100B | Uncertain |
Implementation Roadmap
2025-2026: Foundation Building
Year 1 Priorities ($1B investment)
- Talent: 50% increase in safety researchers through fellowships, PhD programs
- Infrastructure: Safety evaluation platforms, model access protocols
- Research: Focus on near-term measurable progress
2027-2029: Scaling Phase
Years 2-4 Priorities ($2-3B/year)
- International coordination on safety research standards
- Large-scale alignment experiments on frontier models
- Policy research integration with regulatory development
2030+: Deployment Phase
Long-term integration
- Safety research embedded in all major AI development
- International safety research collaboration infrastructure
- Automated safety evaluation and monitoring systems
See Also
- Pre-TAI Capital Deployment — How $100-300B+ gets allocated across the AI industry before transformative AI
- Safety Spending at Scale — Analysis of safety budgets as AI labs scale to billions in annual spending
- Frontier Lab Cost Structure — Breakdown of where frontier lab budgets go (compute, talent, safety, overhead)
- AI Talent Market Dynamics — Competition for scarce AI researchers and its effect on safety capacity
Sources & Resources
Academic Literature
| Paper | Key Finding | Relevance |
|---|---|---|
| Ord (2020)↗🔗 webThe Precipice: Existential Risk and the Future of Humanity (Ord, 2020)A landmark book by Oxford philosopher Toby Ord (founder of Giving What We Can), widely read in EA and AI safety communities as a foundational text on existential risk prioritization and the moral case for long-termism.Toby Ord's 'The Precipice' argues that humanity stands at a critical juncture where existential risks—particularly from emerging technologies like AI—could permanently curtail o...existential-riskai-safetygovernancepolicy+4Source ↗ | 10% x-risk this century | Risk probability estimates |
| Amodei et al. (2016)↗📄 paper★★★☆☆arXivConcrete Problems in AI SafetyWidely considered one of the most influential foundational papers in technical AI safety; frequently cited as a key reference for the research agenda pursued by groups like OpenAI, Anthropic, and DeepMind safety teams.Dario Amodei, Chris Olah, Jacob Steinhardt et al. (2016)2,962 citationsThis foundational paper by Amodei et al. identifies five practical AI safety research problems: avoiding side effects, avoiding reward hacking, scalable oversight, safe explorat...ai-safetyalignmenttechnical-safetyevaluation+5Source ↗ | Safety research agenda | Research direction framework |
| Russell (2019)↗🔗 webCenter for Human-Compatible AICHAI is one of the leading academic institutions focused on AI alignment research, founded by Stuart Russell (author of 'Human Compatible'); its homepage provides an overview of ongoing projects, researchers, and publications central to the field.CHAI is a UC Berkeley research center dedicated to reorienting AI development toward systems that are provably beneficial and aligned with human values. It conducts technical an...ai-safetyalignmenttechnical-safetygovernance+3Source ↗ | Control problem formulation | Alignment problem definition |
| Christiano (2018)↗🔗 webIterated Distillation and AmplificationThis 2018 Medium post is the canonical accessible introduction to Paul Christiano's Iterated Distillation and Amplification (IDA) proposal, a foundational scalable oversight approach widely referenced in the AI alignment literature.This guest post by Ajeya Cotra summarizes Paul Christiano's IDA scheme for training ML systems robustly aligned to complex human values. IDA alternates between amplification (us...ai-safetyalignmenttechnical-safetyiterated-amplification+3Source ↗ | IDA proposal | Specific alignment approach |
Research Organizations
| Organization | Focus | Annual Budget | Key Publications |
|---|---|---|---|
| Anthropic↗🔗 web★★★★☆AnthropicAnthropic - AI Safety Company HomepageAnthropic is a primary institutional actor in AI safety; understanding their research agenda and deployment philosophy is relevant context for the broader AI safety ecosystem, though this homepage itself is a reference point rather than a primary technical resource.Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its famil...ai-safetyalignmentcapabilitiesinterpretability+6Source ↗ | Constitutional AI, interpretability | $100M+ | Constitutional AI paper |
| MIRI | Agent foundations | $5M | Logical induction |
| CHAI | Human-compatible AI | $10M | CIRL framework |
| ARC | Alignment research | $15M | Eliciting latent knowledge |
Policy Resources
| Source | Type | Key Insights |
|---|---|---|
| NIST AI Risk Management Framework↗🏛️ government★★★★★NISTNIST AI Risk Management FrameworkThe NIST AI RMF is a widely referenced U.S. government standard for AI risk governance, frequently cited in policy discussions and used by organizations building internal AI safety and compliance programs; relevant to AI safety researchers tracking institutional governance approaches.The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while pro...governancepolicyai-safetydeployment+4Source ↗ | Standards | Risk assessment methodology |
| UK AI Safety Institute↗🏛️ government★★★★☆UK AI Safety InstituteUK AI Safety Institute (AISI)AISI is a key institutional actor in AI safety, representing one of the first government-led efforts to systematically evaluate frontier AI models; its work and publications are directly relevant to governance, evaluation methodology, and international AI safety coordination.The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, deve...ai-safetygovernancepolicyevaluation+5Source ↗ | Government research | Evaluation frameworks |
| EU AI Act↗🔗 web★★★★☆European UnionEuropean approach to artificial intelligenceThis is the official European Commission policy hub for AI governance, directly relevant to AI safety researchers tracking how major jurisdictions are regulating and shaping AI development through binding law and strategic investment.This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action P...governancepolicyai-safetydeployment+4Source ↗ | Regulation | Compliance requirements |
| RAND AI Strategy↗🔗 web★★★★☆RAND CorporationRAND: AI and National SecurityRAND is a major U.S. think tank with significant influence on government AI policy; their research often shapes defense and national security AI guidelines, making it a key reference for governance and policy-oriented AI safety work.RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on A...governancepolicyai-safetyexistential-risk+3Source ↗ | Analysis | Military AI implications |
Funding Sources
| Funder | Focus Area | Annual AI Safety | Application Process |
|---|---|---|---|
| Coefficient Giving↗🔗 web★★★★☆Coefficient GivingOpen Philanthropy grants databaseOpen Philanthropy is one of the most influential funders in AI safety; their grants database is a useful reference for understanding which organizations and research directions receive major philanthropic support.Open Philanthropy is a major philanthropic organization that funds work across global health, AI safety, biosecurity, and other cause areas. Their grants database provides trans...ai-safetyexistential-riskgovernancecoordination+3Source ↗ | Technical research, policy | $100M+ | LOI system |
| Future Fund↗🔗 webFTX Future FundThe FTX Future Fund was a major but short-lived philanthropic force in AI safety and EA spaces; its collapse in late 2022 caused significant disruption to funding for numerous AI safety organizations and researchers.The FTX Future Fund was a major philanthropic initiative backed by FTX and Sam Bankman-Fried, focused on funding projects addressing humanity's most pressing long-term risks, in...ai-safetygovernanceexistential-riskcoordination+2Source ↗ | Longtermism, x-risk | $50M+ | Grant applications |
| NSF↗🏛️ governmentNSF Funding OpportunitiesNSF is a significant U.S. government funder of AI-related research; this homepage links to funding opportunities that may support AI safety work, but the page itself contains no substantive AI safety content.The National Science Foundation (NSF) funding portal provides information on grants, fellowships, and research funding opportunities across scientific disciplines. As a major U....governancepolicyresearch-prioritiesai-safety+1Source ↗ | Academic research | $20M | Standard grants |
| Survival and Flourishing Fund↗🔗 webSurvival and Flourishing FundSFF is a key funding institution in the AI safety ecosystem; understanding its grant recipients and evaluation process provides insight into which research directions the philanthropic community considers most promising or neglected.SFF is a philanthropic organization that coordinates grant recommendations for existential risk reduction and AI safety work, having distributed over $152 million since 2019. It...ai-safetyexistential-riskcoordinationgovernance+1Source ↗ | Existential risk | $10M | Quarterly rounds |
References
OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial general intelligence (AGI) benefits all of humanity. The homepage serves as a gateway to their research, products, and policy work spanning capabilities and safety.
Google DeepMind is a leading AI research laboratory combining the former DeepMind and Google Brain teams, focused on developing advanced AI systems and conducting research across capabilities, safety, and applications. The organization is one of the most influential labs in AI development, working on frontier models including Gemini and publishing widely-cited safety and capabilities research.
This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action Plan, and Apply AI Strategy. It aims to balance Europe's global AI competitiveness with safety, fundamental rights, and democratic values. Key initiatives include AI Factories, the InvestAI Facility, GenAI4EU, and the Apply AI Alliance.
The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk research and AI safety. FHI closed on 16 April 2024 after approximately two decades of influential work. The site now serves as an archived record of the institution's history, research agenda, and legacy.
DARPA is the U.S. Department of Defense's primary research agency focused on creating transformative technologies for national security. The homepage highlights current programs including autonomous systems (RACER mine-clearing), battlefield casualty care (Live Chain), and biosecurity challenges. DARPA funds high-risk, high-reward research across AI, autonomy, biotechnology, and other emerging domains relevant to AI safety and governance.
This page represents Open Philanthropy's technical AI safety research funding hub, now rebranded as Coefficient Giving. The organization has directed over $4 billion in grants since 2014, with a dedicated 'Navigating Transformative AI' fund focused on ensuring AI is safe and well-governed. It serves as a major philanthropic funder for AI safety research and related existential risk work.
This Epoch AI resource appears to analyze funding trends in machine learning, but the page is no longer accessible at the given URL, returning a 404 error. The content has either been moved or removed from the Epoch AI website.
The 2022 ESPAI surveyed 738 machine learning researchers (NeurIPS/ICML authors) about AI progress timelines and risks, serving as a replication and update of the 2016 survey. Key findings include an aggregate forecast of 50% chance of HLMI by 2059 (37 years from 2022), with significant disagreement among experts about timelines and risks.
AI Impacts is a research organization that investigates empirical questions relevant to AI forecasting and safety, including AI timelines, discontinuous progress risks, and existential risk arguments. It maintains a wiki and blog featuring expert surveys, historical analyses, and structured arguments about transformative AI development. Notable outputs include periodic expert surveys on AI progress timelines.
The National Science Foundation (NSF) funding portal provides information on grants, fellowships, and research funding opportunities across scientific disciplines. As a major U.S. federal research funder, NSF supports basic and applied research relevant to AI safety and related fields. The page content was inaccessible due to JavaScript requirements.
The FTX Future Fund was a major philanthropic initiative backed by FTX and Sam Bankman-Fried, focused on funding projects addressing humanity's most pressing long-term risks, including AI safety, biosecurity, and existential risk reduction. It represented one of the largest EA-aligned grantmaking organizations before FTX's collapse in November 2022 forced the fund to shut down. This is an archived version of its website.
The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while promoting trustworthiness across design, development, deployment, and evaluation. It provides structured guidance organized around core functions and is accompanied by a Playbook, Roadmap, and a Generative AI Profile (2024) addressing risks specific to generative AI systems.
A curated index of DeepMind/Google DeepMind research publications filtered by the 'safety' tag, covering 240 papers spanning topics such as AI consciousness, existential safety, human-AI alignment, AI personhood, and technical safety research. The listing spans multiple years and reflects the breadth of safety-related work coming out of one of the world's leading AI labs.
This guest post by Ajeya Cotra summarizes Paul Christiano's IDA scheme for training ML systems robustly aligned to complex human values. IDA alternates between amplification (using humans plus AI tools to handle harder tasks) and distillation (training a new AI to imitate that augmented human), iteratively bootstrapping capability while preserving alignment. The approach draws analogies to AlphaGo Zero and expert iteration.
OpenAI's central safety page providing updates on their approach to AI safety research, deployment practices, and ongoing safety commitments. It serves as a hub for information on OpenAI's safety-related initiatives, policies, and technical work aimed at ensuring their AI systems are safe and beneficial.
NIST's AI hub provides foundational guidelines, standards, and governance frameworks for responsible AI development, centered on the AI Risk Management Framework (AI RMF). As a nonregulatory federal agency, NIST promotes trustworthy AI through measurement science, voluntary technical standards, and stakeholder collaboration to balance innovation with risk mitigation.
GiveWell is a nonprofit charity evaluator that researches and recommends highly effective giving opportunities, focusing on evidence-based interventions with strong cost-effectiveness. It conducts in-depth analysis of charities to identify where donations can do the most good, primarily in global health and poverty. GiveWell exemplifies the effective altruism methodology of rigorous expected-value reasoning applied to philanthropic decisions.
The Long-Term Future Fund is an Effective Altruism-affiliated grantmaking fund focused on improving humanity's prospects over the long run, particularly by supporting work on reducing existential and catastrophic risks. It funds research, advocacy, and capacity-building projects related to AI safety, biosecurity, and other global priorities. The fund is managed by a committee of EA community members and operates on a rolling grants basis.
CHAI is a UC Berkeley research center dedicated to reorienting AI development toward systems that are provably beneficial and aligned with human values. It conducts technical and conceptual research on problems including value alignment, corrigibility, and AI safety, and serves as a major hub for academic AI safety work.
SFF is a philanthropic organization that coordinates grant recommendations for existential risk reduction and AI safety work, having distributed over $152 million since 2019. It uses a distinctive 'S-Process' for collaborative grant evaluation among multiple donors and advisors. SFF is a significant funding source for many leading AI safety organizations and researchers.
Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its family of AI assistants, with a stated mission of responsible development and maintenance of advanced AI for long-term human benefit.
Toby Ord's 'The Precipice' argues that humanity stands at a critical juncture where existential risks—particularly from emerging technologies like AI—could permanently curtail our long-term potential. The book estimates probabilities of various catastrophic risks, makes the case for prioritizing existential risk reduction, and outlines a research and policy agenda for safeguarding humanity's future.
This foundational paper by Amodei et al. identifies five practical AI safety research problems: avoiding side effects, avoiding reward hacking, scalable oversight, safe exploration, and robustness to distributional shift. It frames these as concrete technical challenges arising from real-world ML system design, providing a research agenda that has significantly shaped the field of AI safety.
RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on AI risks, military applications, and regulatory frameworks from one of the leading U.S. defense and policy think tanks.
The NSF is the primary U.S. federal agency funding basic research and education across all non-medical fields of science and engineering. It supports a broad portfolio of research including computer science, AI, and emerging technologies critical to national competitiveness. NSF funding decisions shape the direction of academic research relevant to AI safety and alignment.
Open Philanthropy is a major philanthropic organization that funds work across global health, AI safety, biosecurity, and other cause areas. Their grants database provides transparency into which organizations and research directions receive funding. They are one of the largest funders of AI safety and existential risk research.
Anthropic's research page aggregates their work across AI alignment, mechanistic interpretability, and societal impact assessment, all oriented toward understanding and mitigating risks from increasingly capable AI systems. It serves as a central hub for their published findings and ongoing safety-focused investigations.
The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, develops evaluation frameworks for frontier AI models, and works with international partners to inform global AI governance and policy.