Capabilities-to-Safety Pipeline Model
Capabilities-to-Safety Pipeline Model
Quantitative pipeline model finds only 200-400 ML researchers transition to safety work annually (far below 1,000-2,000 needed), with 60-75% blocked at consideration-to-action stage. MATS training programs achieve 60-80% conversion rates at $20-40K per researcher, while fellowships cost $50-100K; coordinated $50M annual investment could plausibly double transition rates within 2-3 years.
Overview
The capabilities-to-safety pipeline model analyzes how ML researchers transition from capabilities work to AI safety research. Given severe talent constraints in safety work, the pool of 50,000-100,000 experienced ML researchers globally represents the most promising source of qualified safety researchers. Understanding transition dynamics is critical as the field requires researchers with deep ML expertise who can address alignment challenges and emerging capabilities.
The model reveals severe pipeline bottlenecks. Only 20-30% of ML researchers are aware of safety concerns, and just 10-15% of aware researchers seriously consider transitioning. Among those considering transition, 60-75% are blocked by practical barriers. This yields current annual transition rates of only 50-400 researchers—far below what's needed as AI systems approach AGI timelines.
The analysis identifies high-leverage intervention points. Training programs like MATS↗🔗 webMATS Research ProgramMATS is one of the primary talent pipelines into the AI safety field; wiki users interested in career transitions or field-building efforts should consider this a key institutional reference.MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and co...ai-safetyalignmentfield-buildingeducational+4Source ↗ achieve 60-80% conversion rates at $10-40K per transition. Fellowship programs targeting financial barriers show promise at $50-100K per transition. Internal advocacy within frontier AI labs could unlock 50-100 additional transitions annually. These interventions could plausibly double transition rates within 2-3 years with coordinated investment.
Risk/Impact Assessment
| Dimension | Assessment | Evidence | Timeline | Trend |
|---|---|---|---|---|
| Talent Shortage | High | Safety field needs 2-5x current researcher count | 2-5 years | Worsening |
| Pipeline Efficiency | Very Low | <1% of ML researchers transition annually | Current | Flat |
| Quality Dilution Risk | Medium | Rapid scaling may reduce average researcher quality | 3-5 years | Uncertain |
| Intervention Leverage | Very High | 5-10x transition rate increases possible | 1-3 years | Improving |
| Funding Constraints | Medium | $50M annually could transform pipeline | 1-2 years | Improving |
Pipeline Architecture
The transition pipeline follows a sequential funnel structure with four critical stages. Each stage exhibits characteristic conversion rates and barriers, with overall flow determined by the most constrained bottleneck.
Diagram (loading…)
flowchart TD
subgraph Source["Source Pool"]
A[ML Researchers<br/>50,000-100,000]
end
subgraph Awareness["Awareness Stage"]
B[Aware of Safety Concerns<br/>10,000-30,000]
end
subgraph Interest["Interest Stage"]
C[Considering Transition<br/>1,000-5,000]
end
subgraph Exploration["Exploration Stage"]
D[Actively Exploring<br/>200-1,000]
end
subgraph Conversion["Conversion Stage"]
E[Safety Researchers<br/>50-400/year inflow]
end
A -->|"Exposure Rate: 20-30%"| B
B -->|"Consideration Rate: 10-15%"| C
C -->|"Exploration Rate: 20-30%"| D
D -->|"Transition Rate: 25-40%"| E
style A fill:#e8f4f8
style B fill:#d4edda
style C fill:#fff3cd
style D fill:#f8d7da
style E fill:#d1ecf1Mathematical Framework
Annual safety researcher inflow follows a cascade model:
Current estimates yield: 75,000 × 0.25 × 0.125 × 0.25 × 0.325 ≈ 190 transitions/year.
The key insight: doubling any single conversion probability doubles overall flow, but interventions have different costs and feasibility profiles.
Source Population Analysis
Researcher Demographics by Organization
The ML research community exhibits substantial heterogeneity in safety awareness, transition propensity, and barrier profiles. Frontier AI labs show highest awareness but face competing incentives, while academic researchers have lower awareness but fewer organizational constraints.
| Population Segment | Size | Awareness Rate | Safety-Receptive | Effective Pool | Key Characteristics |
|---|---|---|---|---|---|
| Frontier Labs | 5,000-10,000 | 70% | 60% | 2,100-4,200 | High capability, aware of risks |
| Academic ML | 15,000-30,000 | 30% | 50% | 2,250-4,500 | Research freedom, funding constraints |
| Tech Companies | 20,000-40,000 | 15% | 30% | 900-1,800 | Product focus, limited exposure |
| AI Startups | 10,000-20,000 | 20% | 35% | 700-1,400 | Entrepreneurial, mixed motivations |
Source: 80,000 Hours↗🔗 web★★★☆☆80,000 Hours80,000 Hours methodology80,000 Hours is a major talent and career funnel into the AI safety ecosystem; useful for understanding how researchers and practitioners are recruited into the field and what career paths are considered high-impact by the effective altruism community.80,000 Hours is a nonprofit that provides research and advice on how to use your career to have the most positive impact on the world's most pressing problems, with significant ...ai-safetyexistential-riskgovernancefield-building+3Source ↗ surveys, MIRI↗🔗 web★★★☆☆MIRIMachine Intelligence Research InstituteMIRI is a foundational organization in the AI safety ecosystem; its research agenda and publications have significantly shaped the field's early theoretical frameworks.MIRI is a nonprofit research organization focused on ensuring that advanced AI systems are safe and beneficial. It conducts technical research on the mathematical foundations of...ai-safetyalignmentexistential-risktechnical-safety+2Source ↗ researcher tracking, academic publication analysis.
Quality Distribution
Research quality varies substantially across the pipeline, with implications for intervention targeting and field impact. A-tier researchers (top 10%) generate disproportionate research value but require tailored recruitment approaches.
| Quality Tier | Share of Population | Transition Rate | Impact Multiplier | Best Intervention |
|---|---|---|---|---|
| A-Tier | 5-10% | 2-5% | 5-10x | Personal outreach, elite fellowships |
| B-Tier | 15-25% | 1-3% | 2-3x | Training programs, medium fellowships |
| C-Tier | 40-50% | 0.5-2% | 1-1.5x | Mass outreach, basic training |
| D-Tier | 20-30% | 0.2-1% | 0.5-1x | Generally not cost-effective |
Conversion Funnel Analysis
Stage-by-Stage Bottlenecks
The pipeline exhibits severe attrition at two critical junctions: awareness-to-consideration (85-90% drop-off) and consideration-to-action (60-75% drop-off). These represent distinct intervention opportunities requiring different strategies.
| Transition | Baseline Rate | Best-Case Rate | Bottleneck Type | Primary Intervention |
|---|---|---|---|---|
| Unaware → Aware | 20-30% | 40-60% | Information | Outreach, community building |
| Aware → Considering | 10-15% | 25-40% | Motivation | Compelling narratives, peer influence |
| Considering → Exploring | 20-30% | 50-70% | Barriers | Financial support, skill development |
| Exploring → Transitioning | 25-40% | 60-80% | Support | Mentorship, placement assistance |
Source: MATS program data↗🔗 webMATS Research ProgramMATS is one of the primary talent pipelines into the AI safety field; wiki users interested in career transitions or field-building efforts should consider this a key institutional reference.MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and co...ai-safetyalignmentfield-buildingeducational+4Source ↗, Anthropic↗🔗 web★★★★☆AnthropicAnthropic - AI Safety Company HomepageAnthropic is a primary institutional actor in AI safety; understanding their research agenda and deployment philosophy is relevant context for the broader AI safety ecosystem, though this homepage itself is a reference point rather than a primary technical resource.Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its famil...ai-safetyalignmentcapabilitiesinterpretability+6Source ↗ internal transitions, researcher interviews.
Transition Pathways
Researchers transition through multiple distinct pathways with different success rates, timelines, and intervention requirements.
| Transition Type | Annual Volume | Timeline | Success Rate | 3-Year Retention | Cost per Transition |
|---|---|---|---|---|---|
| Full Career Switch | 20-100 | 3-12 months | 70-85% | 60-75% | $50-100K |
| Internal Reallocation | 50-200 | 1-6 months | 85-95% | 70-85% | $20-50K |
| Hybrid Role | 100-300 | 6-24 months | 60-75% | 40-60% | $30-70K |
| Part-Time Contribution | 200-500 | Ongoing | 40-60% | 30-50% | $10-30K |
Barrier Analysis
Financial Obstacles
Salary differentials create the largest single barrier to transition, particularly affecting mid-career researchers with financial obligations. The barrier varies substantially by transition pathway and source organization.
| Pathway | Typical Salary Impact | Population Affected | Barrier Strength | Addressability |
|---|---|---|---|---|
| Capabilities → Safety Lab | -20% to -40% | 40-60% | High | Fellowship programs |
| Industry → Nonprofit | -40% to -60% | 60-80% | Very High | Major fellowships |
| Academia Switch | -30% to -50% | 50-70% | High | Postdoc support |
| Internal Transfer | -10% to -30% | 20-40% | Medium | Negotiation, gradual transition |
Source: OpenAI↗🔗 web★★★★☆OpenAIOpenAI Official HomepageOpenAI is a central organization in the AI safety and capabilities landscape; this homepage links to their models, research publications, and policy positions, making it a useful reference point for tracking frontier AI development.OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial g...capabilitiesalignmentgovernancedeployment+5Source ↗ researcher surveys, Redwood Research↗🔗 webRedwood Research: AI ControlRedwood Research is one of the leading technical AI safety organizations; their AI control framework and alignment faking research are frequently cited in both academic and policy discussions on managing risks from advanced AI systems.Redwood Research is a nonprofit AI safety organization that pioneered the 'AI control' research agenda, focusing on preventing intentional subversion by misaligned AI systems. T...ai-safetyalignmenttechnical-safetyred-teaming+5Source ↗ hiring data.
Skill Gap Assessment
Safety research requires partially different skills than capabilities work, creating uncertainty and learning curves. The largest gaps appear in alignment theory and safety evaluation methodologies.
| Capability Skill | Safety Relevance | Gap Size | Reskilling Time | Training Availability |
|---|---|---|---|---|
| ML Engineering | High | Small | 1-2 months | Abundant |
| Model Training | Very High | Small | 1-2 months | Good |
| Interpretability | Very High | Medium | 2-4 months | Growing |
| Alignment Theory | Critical | Large | 4-8 months | Limited |
| Safety Evaluation | Critical | Large | 3-6 months | Limited |
Social and Professional Costs
Researchers systematically overestimate reputational and career risks of transitioning to safety work. Actual costs are typically lower than perceived, suggesting information interventions could reduce this barrier.
| Perceived Cost | Perceived Severity | Actual Severity | Addressable Through |
|---|---|---|---|
| Reputation Damage | Medium | Low | Success stories, prestige signals |
| Network Loss | High | Medium | Community building, hybrid roles |
| Skill Atrophy | High | Medium | Continued learning, return guarantees |
| Career Ceiling | Medium | Low | Senior role examples, career paths |
Intervention Effectiveness
High-Impact Programs
Training programs demonstrate the highest conversion rates and most durable transitions. MATS↗🔗 webMATS Research ProgramMATS is one of the primary talent pipelines into the AI safety field; wiki users interested in career transitions or field-building efforts should consider this a key institutional reference.MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and co...ai-safetyalignmentfield-buildingeducational+4Source ↗ represents the gold standard with 60-80% conversion rates and strong retention.
| Intervention | Target Barrier | Conversion Rate | Cost per Transition | Implementation Difficulty | Annual Capacity |
|---|---|---|---|---|---|
| Intensive Training (MATS) | Skills, community | 60-80% | $20-40K | Medium | 50-100 |
| Fellowship Programs | Financial | 20-40% | $50-100K | Low | 200-500 |
| Part-time Courses | Skills, exposure | 30-50% | $5-15K | Low | 500-1,000 |
| Personal Outreach | Awareness, motivation | 10-30% | $50-150K | High | 20-50 |
Source: MATS program outcomes, 80,000 Hours↗🔗 web★★★☆☆80,000 Hours80,000 Hours methodology80,000 Hours is a major talent and career funnel into the AI safety ecosystem; useful for understanding how researchers and practitioners are recruited into the field and what career paths are considered high-impact by the effective altruism community.80,000 Hours is a nonprofit that provides research and advice on how to use your career to have the most positive impact on the world's most pressing problems, with significant ...ai-safetyexistential-riskgovernancefield-building+3Source ↗ coaching data, fellowship program evaluations.
Organizational Interventions
Internal advocacy within AI labs offers high leverage by creating institutional pathways that reduce barriers for multiple researchers. Anthropic's↗🔗 web★★★★☆AnthropicAnthropic - AI Safety Company HomepageAnthropic is a primary institutional actor in AI safety; understanding their research agenda and deployment philosophy is relevant context for the broader AI safety ecosystem, though this homepage itself is a reference point rather than a primary technical resource.Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its famil...ai-safetyalignmentcapabilitiesinterpretability+6Source ↗ internal transfer program demonstrates feasibility.
| Organization Type | Intervention | Expected Annual Yield | Implementation Barriers |
|---|---|---|---|
| Frontier Labs | Internal transfer programs | 50-100 | Competing priorities, talent retention |
| Tech Companies | Safety team creation | 20-50 | Business case, executive buy-in |
| Universities | Safety faculty hiring | 10-30 | Academic incentives, funding |
| Startups | Mission pivoting | 5-20 | Business model, investor concerns |
Financial Support Models
Fellowship programs directly address salary differential barriers. Effectiveness varies by target population and support level, with medium-sized fellowships showing optimal cost-effectiveness.
| Fellowship Type | Target Population | Support Level | Success Rate | Cost-Effectiveness |
|---|---|---|---|---|
| Elite Fellowships | A-tier researchers | Full salary (>$200K) | 80-95% | Medium |
| Standard Fellowships | B-tier researchers | Partial support ($50-100K) | 60-80% | High |
| Bridge Grants | Transitioning researchers | Living expenses ($30-50K) | 40-60% | Very High |
| Exploration Stipends | Exploring researchers | Part-time support (<$30K) | 20-40% | High |
Current State & Trajectory
2024 Baseline Metrics
Current transition rates remain far below field needs, with the safety researcher community requiring 2-5x growth to match capability development pace. Existing programs operate at small scale relative to potential impact.
| Metric | Current Value | Target Value | Gap | Trend |
|---|---|---|---|---|
| Annual Transitions | 200-400 | 1,000-2,000 | 3-5x | Slowly improving |
| Awareness Rate | 20-30% | 50-70% | 2-3x | Improving |
| Training Capacity | 100-200/year | 500-1,000/year | 5x | Growing |
| Fellowship Funding | $5-10M/year | $30-50M/year | 5x | Growing |
Source: Field surveys, program tracking, funding analysis.
Near-Term Projections (2025-2027)
Under current trajectories, modest improvements in transition rates are expected through program scaling and increased awareness. Major breakthroughs require coordinated intervention scaling.
| Scenario | 2025 Flow | 2027 Flow | 2027 Stock | Key Drivers |
|---|---|---|---|---|
| Status Quo | 250 | 300 | 2,000 | Natural growth, modest scaling |
| Moderate Investment | 350 | 600 | 2,600 | 2x program capacity, fellowship expansion |
| Major Mobilization | 600 | 1,200 | 3,800 | Coordinated field-wide intervention |
| Intervention Failure | 200 | 150 | 1,700 | Funding cuts, interest decline |
Quality-Weighted Analysis
Raw transition numbers understate impact differences because researcher productivity varies substantially. A-tier researchers generate 5-10x more research value than C-tier researchers.
| Scenario | A-Tier Annual | B-Tier Annual | Quality-Weighted Total |
|---|---|---|---|
| Status Quo | 30-40 | 100-150 | 300-450 equivalent |
| Targeted Quality | 80-120 | 120-180 | 650-900 equivalent |
| Broad Mobilization | 60-100 | 300-500 | 750-1,200 equivalent |
Feedback Dynamics
The pipeline exhibits multiple feedback loops affecting long-term stability and growth potential. Positive loops include network effects and legitimacy spillovers, while negative loops involve quality dilution and position saturation.
Diagram (loading…)
flowchart TD A[Transition Volume] -->|"Network effects"| B[Field Attractiveness] B -->|"More interest"| A A -->|"Quality dilution"| C[Average Quality] C -->|"Reputation effects"| B D[Success Stories] -->|"Reduced perceived risk"| B E[Position Availability] -->|"Opportunity clarity"| B A -->|"Position filling"| F[Job Market Tightness] F -->|"Reduced opportunity"| B style A fill:#e1f5ff style B fill:#ccffcc style C fill:#ffddcc style D fill:#cceeff style E fill:#e6ffe6 style F fill:#ffe6e6
Network Effect Analysis
Network effects create self-reinforcing dynamics where successful transitions catalyze additional transitions through peer influence and social proof. Each transitioner expands the safety community's reach into capabilities networks.
| Network Metric | Current Value | Projected 2027 | Impact |
|---|---|---|---|
| Safety-Capabilities Connections | 500-1,000 | 2,000-4,000 | Higher awareness, recruitment |
| Transition Success Stories | 50-100 | 200-400 | Reduced perceived risk |
| Cross-Community Events | 10-20/year | 30-50/year | Increased exposure |
Case Studies
MATS Program Impact Analysis
The ML Alignment Theory Scholars program↗🔗 webMATS Research ProgramMATS is one of the primary talent pipelines into the AI safety field; wiki users interested in career transitions or field-building efforts should consider this a key institutional reference.MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and co...ai-safetyalignmentfield-buildingeducational+4Source ↗ provides the strongest empirical evidence for training intervention effectiveness, processing 300+ participants since 2021.
| MATS Metric | Value | Benchmark | Implication |
|---|---|---|---|
| Conversion Rate | 65-75% | 30-50% (other programs) | Superior model effectiveness |
| 2-Year Retention | ≈70% | ≈60% (field average) | Durable career changes |
| Research Output | 2-4 papers/participant | 1-2 (typical) | Immediate productivity |
| Placement Success | 80-90% | 50-70% (unstructured) | Strong institutional connections |
| Cost Effectiveness | $25K per transition | $75K (fellowship baseline) | Highly efficient |
Key success factors: Full-time immersion, mentorship quality, cohort peer effects, research output requirements, placement support.
Scaling constraints: Mentor availability (limiting factor), quality control, funding concentration risk.
Anthropic Internal Pipeline
Anthropic's↗🔗 web★★★★☆AnthropicAnthropic - AI Safety Company HomepageAnthropic is a primary institutional actor in AI safety; understanding their research agenda and deployment philosophy is relevant context for the broader AI safety ecosystem, though this homepage itself is a reference point rather than a primary technical resource.Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its famil...ai-safetyalignmentcapabilitiesinterpretability+6Source ↗ capabilities-to-safety transfer program demonstrates organizational best practices for internal reallocation.
| Anthropic Metric | Value | External Benchmark | Advantage |
|---|---|---|---|
| Annual Transfers | 20-35 | 5-10 (typical lab) | 3-5x higher rate |
| Transfer Timeline | 2-4 months | 6-12 months | Faster execution |
| Retention Rate | 90-95% | 70-80% | Lower switching costs |
| Salary Impact | -5% to -15% | -20% to -40% | Reduced financial barrier |
Enabling factors: Constitutional AI mission alignment, internal mobility culture, safety team prestige, reduced bureaucracy.
Key Uncertainties & Cruxes
Key Questions
- ?What is the maximum sustainable transition rate before quality dilution becomes problematic?
- ?How sensitive are transition decisions to financial incentives versus mission alignment?
- ?Can training programs scale 5-10x while maintaining conversion rates and quality?
- ?What fraction of top-tier capabilities researchers are realistically convertible?
- ?How will pipeline dynamics change as AI development accelerates and safety becomes more urgent?
- ?Would high transition rates harm capabilities progress in ways that reduce overall safety?
Critical Decision Points
Several key uncertainties affect optimal intervention strategy and resource allocation:
Quality vs. Quantity Tradeoff: Whether to focus on converting small numbers of A-tier researchers or larger numbers of B/C-tier researchers remains contentious. A-tier focus maximizes research impact but may miss opportunities for field growth.
Timing Considerations: Whether to invest heavily in transitions now or wait for AI capabilities to advance further, potentially increasing motivation naturally.
Organizational Capture Risk: Whether high transition rates from specific organizations (OpenAI, DeepMind) could reduce safety-conscious voices within those organizations.
Future Research Directions
Priority research questions for improving pipeline understanding and intervention effectiveness:
Empirical Studies Needed
| Research Question | Methodology | Timeline | Priority |
|---|---|---|---|
| Transition Success Predictors | Longitudinal tracking, regression analysis | 2-3 years | High |
| Intervention Effectiveness RCT | Randomized fellowship/training assignment | 1-2 years | Very High |
| Quality Metrics Validation | Peer assessment, impact tracking | 3-5 years | Medium |
| Organizational Barrier Analysis | Ethnographic study, insider interviews | 1 year | High |
Model Improvements
Current model limitations suggest several enhancement priorities:
- Dynamic Conversion Rates: Model how conversion probabilities change over time with field growth and external events
- Quality Interactions: Better understanding of how researcher quality affects transition success and field impact
- Intervention Synergies: Analysis of how multiple interventions interact rather than simple additive effects
- Reverse Flow Analysis: Study of safety-to-capabilities transitions and their implications
Sources & Resources
Academic Literature
| Paper/Source | Key Findings | Relevance |
|---|---|---|
| Russell (2019) - Human Compatible↗🔗 webCenter for Human-Compatible AICHAI is one of the leading academic institutions focused on AI alignment research, founded by Stuart Russell (author of 'Human Compatible'); its homepage provides an overview of ongoing projects, researchers, and publications central to the field.CHAI is a UC Berkeley research center dedicated to reorienting AI development toward systems that are provably beneficial and aligned with human values. It conducts technical an...ai-safetyalignmenttechnical-safetygovernance+3Source ↗ | Career transition motivations | Philosophical foundations |
| Baum (2017) - Survey of AI researchers↗📄 paper★★★☆☆arXivBaum (2017) - Survey of AI researchersThis widely-cited survey (often called the 'Grace et al. 2018' survey) is a key empirical reference for AI timelines debates; useful for grounding discussions about when transformative AI might arrive and how expert opinion varies by geography and subfield.Katja Grace, John Salvatier, Allan Dafoe et al. (2017)684 citationsA large-scale survey of machine learning researchers (1634 respondents from NeurIPS and ICML 2016) documenting expert predictions on AI timelines across specific tasks and gener...capabilitiesgovernancepolicyai-safety+3Source ↗ | Risk awareness levels | Population baseline |
| Labor economics career switching studies | Transition barriers and success factors | Methodological framework |
Program Data Sources
| Organization | Data Type | Access Level | Quality |
|---|---|---|---|
| MATS Program↗🔗 webMATS Research ProgramMATS is one of the primary talent pipelines into the AI safety field; wiki users interested in career transitions or field-building efforts should consider this a key institutional reference.MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and co...ai-safetyalignmentfield-buildingeducational+4Source ↗ | Participant outcomes, retention | Public reports | High |
| 80,000 Hours↗🔗 web★★★☆☆80,000 Hours80,000 Hours methodology80,000 Hours is a major talent and career funnel into the AI safety ecosystem; useful for understanding how researchers and practitioners are recruited into the field and what career paths are considered high-impact by the effective altruism community.80,000 Hours is a nonprofit that provides research and advice on how to use your career to have the most positive impact on the world's most pressing problems, with significant ...ai-safetyexistential-riskgovernancefield-building+3Source ↗ | Career coaching data | Aggregate only | Medium |
| Future of Humanity Institute↗🔗 web★★★★☆Future of Humanity Institute**Future of Humanity Institute**FHI was a pioneering institution in AI safety and existential risk; this archived homepage is useful for historical context and understanding the institutional origins of the field, though the site is no longer actively updated following its April 2024 closure.The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk researc...ai-safetyexistential-riskalignmentgovernance+3Source ↗ | Researcher surveys | Limited | Medium |
| Centre for AI Safety↗🔗 web★★★★☆Center for AI SafetyCenter for AI Safety (CAIS) – HomepageCAIS is one of the leading AI safety research organizations; this homepage provides an entry point to their research, public statements, and field-building initiatives relevant to anyone working in or entering AI safety.The Center for AI Safety (CAIS) is a research organization focused on mitigating catastrophic and existential risks from advanced AI systems. It conducts technical research, pub...ai-safetyexistential-riskalignmentfield-building+4Source ↗ | Fellowship outcomes | Private | High |
Policy and Governance Resources
| Organization | Resource Type | Focus Area |
|---|---|---|
| RAND Corporation↗🔗 web★★★★☆RAND CorporationRAND Provides Objective Research Services and Public Policy AnalysisRAND Corporation's homepage serves as an entry point to a large body of policy-relevant research on AI governance, national security, and emerging technology risks, useful as a reference for policymakers and researchers in the AI safety space.RAND Corporation is a nonprofit research organization providing objective analysis and policy recommendations across a wide range of topics including national security, technolo...governancepolicyai-safetycybersecurity+4Source ↗ | Policy analysis | Talent pipeline governance |
| Center for New American Security↗🔗 web★★★★☆CNASCenter for a New American Security (CNAS) - HomepageCNAS is a mainstream national security think tank; relevant to AI safety primarily through its Technology & National Security program covering AI governance and defense AI policy, but not an AI safety-focused organization.CNAS is a Washington D.C.-based national security think tank publishing research on defense, technology policy, economic security, and AI governance. Its Technology & National S...governancepolicyai-safetycapabilities+2Source ↗ | Strategic reports | National security implications |
| Future of Life Institute↗🔗 web★★★☆☆Future of Life InstituteFuture of Life InstituteFLI is one of the earliest and most prominent AI safety-focused organizations; a key node in the broader ecosystem for policy advocacy, public communication, and research funding in AI existential risk reduction.The Future of Life Institute (FLI) is a nonprofit organization focused on steering transformative technologies, particularly AI, away from catastrophic risks and toward benefici...ai-safetygovernanceexistential-riskpolicy+3Source ↗ | Advocacy research | Field building strategy |
Community and Network Resources
| Platform | Type | Purpose |
|---|---|---|
| AI Alignment Forum↗✏️ blog★★★☆☆Alignment ForumAI Alignment ForumThe AI Alignment Forum is the primary online community for technical AI safety research; the featured post represents foundational agent-foundations work questioning utility function orthodoxy in decision theory.The AI Alignment Forum is a central community platform for technical AI safety and alignment research discussion. The featured post argues against 'reductive utility' (utility f...alignmentai-safetytechnical-safetydecision-theory+1Source ↗ | Discussion forum | Research communication |
| EA Forum Career Posts↗✏️ blog★★★☆☆EA ForumEA Forum Career PostsThe EA Forum is a major community platform where AI safety researchers, policymakers, and career-changers discuss strategy, share opportunities, and coordinate on field-building; useful for those exploring how to contribute to AI safety work.The Effective Altruism Forum serves as a community hub for discussing careers, cause prioritization, and field-building within the EA and AI safety ecosystem. It hosts posts on ...ai-safetyexistential-riskgovernancecoordination+2Source ↗ | Career advice | Transition guidance |
| Safety researcher Slack communities | Professional network | Peer support |
References
OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial general intelligence (AGI) benefits all of humanity. The homepage serves as a gateway to their research, products, and policy work spanning capabilities and safety.
RAND Corporation is a nonprofit research organization providing objective analysis and policy recommendations across a wide range of topics including national security, technology, governance, and emerging risks. It produces influential studies on AI policy, cybersecurity, and global governance challenges. RAND's work is frequently cited by governments and policymakers worldwide.
The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk research and AI safety. FHI closed on 16 April 2024 after approximately two decades of influential work. The site now serves as an archived record of the institution's history, research agenda, and legacy.
The AI Alignment Forum is a central community platform for technical AI safety and alignment research discussion. The featured post argues against 'reductive utility' (utility functions over possible worlds) and proposes the Jeffrey-Bolker framework as an alternative that avoids ontological crises and computability constraints by grounding preferences in agent-relative events rather than universal physics.
Redwood Research is a nonprofit AI safety organization that pioneered the 'AI control' research agenda, focusing on preventing intentional subversion by misaligned AI systems. Their key contributions include the ICML paper on AI Control protocols, the Alignment Faking demonstration (with Anthropic), and consulting work with governments and AI labs on misalignment risk mitigation.
CNAS is a Washington D.C.-based national security think tank publishing research on defense, technology policy, economic security, and AI governance. Its Technology & National Security program produces policy-relevant work on AI, cybersecurity, and emerging technologies with implications for AI safety and governance.
The Future of Life Institute (FLI) is a nonprofit organization focused on steering transformative technologies, particularly AI, away from catastrophic risks and toward beneficial outcomes. They operate across policy advocacy, research funding, education, and outreach to promote responsible AI development. FLI has been influential in key AI safety milestones including the open letter on AI risks and the Asilomar AI Principles.
MIRI is a nonprofit research organization focused on ensuring that advanced AI systems are safe and beneficial. It conducts technical research on the mathematical foundations of AI alignment, aiming to solve core theoretical problems before transformative AI is developed. MIRI is one of the pioneering organizations in the AI safety field.
A large-scale survey of machine learning researchers (1634 respondents from NeurIPS and ICML 2016) documenting expert predictions on AI timelines across specific tasks and general capabilities. Key findings include a 50% probability of AI outperforming humans in all tasks within 45 years, with significant geographic variation—Asian researchers predict these milestones roughly a decade sooner than North American counterparts.
CHAI is a UC Berkeley research center dedicated to reorienting AI development toward systems that are provably beneficial and aligned with human values. It conducts technical and conceptual research on problems including value alignment, corrigibility, and AI safety, and serves as a major hub for academic AI safety work.
The Center for AI Safety (CAIS) is a research organization focused on mitigating catastrophic and existential risks from advanced AI systems. It conducts technical research, publishes surveys and statements, and supports field-building efforts across academia and industry. CAIS is notable for its broad coalition-building, including its widely-cited statement on AI extinction risk signed by leading researchers.
Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its family of AI assistants, with a stated mission of responsible development and maintenance of advanced AI for long-term human benefit.
MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and community integration. Since 2021, it has trained over 446 researchers who have collectively produced 150+ research papers and gone on to work at top AI safety organizations.
The Effective Altruism Forum serves as a community hub for discussing careers, cause prioritization, and field-building within the EA and AI safety ecosystem. It hosts posts on career transitions into high-impact roles, including AI safety research, policy, and governance positions. The forum aggregates community thinking on how individuals can best contribute to reducing existential risks.
80,000 Hours is a nonprofit that provides research and advice on how to use your career to have the most positive impact on the world's most pressing problems, with significant focus on AI safety and existential risk. They offer career guides, job boards, and in-depth research on high-priority cause areas and career paths. Their methodology emphasizes earning to give, direct work in high-impact fields, and building career capital.