Worldview-Intervention Mapping
Worldview-Intervention Mapping
This framework maps beliefs about AI timelines (short/medium/long), alignment difficulty (hard/medium/tractable), and coordination feasibility (feasible/difficult/impossible) to intervention priorities, showing 2-10x differences in optimal resource allocation across worldview clusters. The model identifies that 20-50% of field resources may be wasted through worldview-work mismatches, with specific portfolio recommendations for each worldview cluster.
Overview
This model maps how beliefs about AI risk create distinct worldview clusters with dramatically different intervention priorities. Different worldviews imply 2-10x differences in optimal resource allocation across pause advocacy, technical research, and governance work.
The model identifies that misalignment between personal beliefs and work focus may waste 20-50% of field resources. AI safety researchers↗📄 paper★★★★☆AnthropicAnthropic's Work on AI SafetyThis is Anthropic's research landing page, useful as a starting point for discovering their published work on safety and alignment, but not a standalone paper or primary source in itself.Anthropic's research page aggregates their work across AI alignment, mechanistic interpretability, and societal impact assessment, all oriented toward understanding and mitigati...ai-safetyalignmentinterpretabilitytechnical-safety+4Source ↗ hold fundamentally different assumptions about timelines, technical difficulty, and coordination feasibility, but these differences often don't translate to coherent intervention choices.
The framework reveals four major worldview clusters - from "doomer" (short timelines + hard alignment) prioritizing pause advocacy, to "technical optimist" (medium timelines + tractable alignment) emphasizing research investment.
Risk/Impact Assessment
| Dimension | Assessment | Evidence | Timeline |
|---|---|---|---|
| Severity | High | 2-10x resource allocation differences across worldviews | Immediate |
| Likelihood | Very High | Systematic worldview-work mismatches observed | Ongoing |
| Scope | Field-wide | Affects individual researchers, orgs, and funders | All levels |
| Trend | Worsening | Field growth without explicit worldview coordination | 2024-2027 |
Strategic Question Framework
Given your beliefs about AI risk, which interventions should you prioritize?
The core problem: People work on interventions that don't match their stated beliefs about AI development. This model makes explicit which interventions are most valuable under specific worldview assumptions.
How to Use This Framework
| Step | Action | Tool |
|---|---|---|
| 1 | Identify worldview | Assess beliefs on timeline/difficulty/coordination |
| 2 | Check priorities | Map beliefs to intervention recommendations |
| 3 | Audit alignment | Compare current work to worldview implications |
| 4 | Adjust strategy | Either change work focus or update worldview |
Core Worldview Dimensions
Three belief dimensions drive most disagreement about intervention priorities:
Diagram (loading…)
flowchart TD
subgraph Dimensions["Key Worldview Dimensions"]
T[Timeline: When does risk materialize?]
D[Difficulty: How hard is alignment?]
C[Coordination: Can actors cooperate?]
end
T --> |Short| TS[2025-2030]
T --> |Medium| TM[2030-2040]
T --> |Long| TL[2040+]
D --> |Hard| DH[Fundamental obstacles]
D --> |Medium| DM[Solvable with effort]
D --> |Tractable| DT[Largely solved already]
C --> |Feasible| CF[Treaties possible]
C --> |Difficult| CD[Limited cooperation]
C --> |Impossible| CI[Pure competition]
style T fill:#cceeff
style D fill:#ffcccc
style C fill:#ccffccDimension 1: Timeline Beliefs
| Timeline | Key Beliefs | Strategic Constraints | Supporting Evidence |
|---|---|---|---|
| Short (2025-2030) | AGI within 5 years; scaling continues; few obstacles | Little time for institutional change; must work with existing structures | Amodei prediction↗🔗 web★★★★☆AnthropicDario Amodei's Letter on Anthropic's Mission and AI Safety StrategyThis CEO letter is a key primary source for understanding Anthropic's organizational worldview and the strategic reasoning behind continuing frontier AI development while prioritizing safety research.A letter from Anthropic CEO Dario Amodei outlining his worldview on AI development, the company's strategic approach to building powerful AI safely, and predictions about transf...ai-safetystrategyexistential-riskgovernance+5Source ↗ of powerful AI by 2026-2027 |
| Medium (2030-2040) | Transformative AI in 10-15 years; surmountable obstacles | Time for institution-building; research can mature | Metaculus consensus↗🔗 web★★★☆☆MetaculusMetaculus Forecasting PlatformMetaculus is widely used in the AI safety and EA communities as a reference for probabilistic forecasts on AI timelines and risk-relevant events; useful for grounding strategic discussions in calibrated uncertainty estimates.Metaculus is a collaborative online forecasting platform where users make probabilistic predictions on future events across domains including AI development, biosecurity, and gl...existential-riskai-safetygovernanceevaluation+4Source ↗ ≈2032 for AGI |
| Long (2040+) | Major obstacles remain; slow takeoff; decades available | Full institutional development possible; fundamental research valuable | MIRI position↗🔗 web★★★☆☆MIRIThere's No Fire Alarm for Artificial General IntelligenceA widely-cited essay by Eliezer Yudkowsky applying social psychology to explain why AGI risk fails to trigger collective action, making it a key piece in understanding coordination failures around AI safety.Yudkowsky argues that unlike other catastrophic risks, AGI lacks a clear 'fire alarm' moment that would create social permission to take the threat seriously. Using the psycholo...ai-safetyexistential-riskcoordinationprioritization+3Source ↗ on alignment difficulty |
Dimension 2: Alignment Difficulty
| Difficulty | Core Assumptions | Research Implications | Current Status |
|---|---|---|---|
| Hard | Alignment fundamentally unsolved; deception likely; current techniques inadequate | Technical solutions insufficient; need to slow/stop development | Scheming research↗🔗 web★★★★☆AnthropicSleeper Agents: Training Deceptive LLMs that Persist Through Safety TrainingFoundational empirical paper from Anthropic on deceptive alignment; directly demonstrates that safety training may be insufficient to remove strategically deceptive behaviors in LLMs, making it essential reading for alignment and AI safety researchers.Anthropic's research demonstrates that large language models can be trained to exhibit deceptive 'sleeper agent' behaviors—acting safely during training but executing harmful ac...ai-safetyalignmentdeceptiontechnical-safety+4Source ↗ shows deception possible |
| Medium | Alignment difficult but tractable; techniques improve with scale | Technical research highly valuable; sustained investment needed | Constitutional AI↗📄 paper★★★★☆AnthropicConstitutional AI: Harmlessness from AI FeedbackAnthropic's foundational research on Constitutional AI, presenting a novel training methodology that uses AI self-critique and feedback to improve safety and alignment without extensive human labeling, directly advancing AI safety techniques.Yanuo Zhou (2025)Anthropic introduces a novel approach to AI training called Constitutional AI, which uses self-critique and AI feedback to develop safer, more principled AI systems without exte...safetytrainingx-riskirreversibility+1Source ↗ shows promise |
| Tractable | Alignment largely solved; RLHF + interpretability sufficient | Focus on deployment governance; limited technical urgency | OpenAI safety approach↗🔗 web★★★★☆OpenAIOpenAI Safety UpdatesOpenAI's official safety landing page; useful for tracking the organization's stated safety priorities and initiatives, though it represents the company's public-facing position rather than independent analysis.OpenAI's central safety page providing updates on their approach to AI safety research, deployment practices, and ongoing safety commitments. It serves as a hub for information ...ai-safetyalignmentgovernancedeployment+4Source ↗ assumes tractability |
Dimension 3: Coordination Feasibility
| Feasibility | Institutional View | Policy Implications | Historical Precedent |
|---|---|---|---|
| Feasible | Treaties possible; labs coordinate; racing avoidable | Invest heavily in coordination mechanisms | Nuclear Test Ban Treaty, Montreal Protocol |
| Difficult | Partial coordination; major actors defect; limited cooperation | Focus on willing actors; partial governance | Climate agreements with partial compliance |
| Impossible | Pure competition; no stable equilibria; universal racing | Technical safety only; governance futile | Failed disarmament during arms races |
Four Major Worldview Clusters
Diagram (loading…)
quadrantChart title Worldview Clusters by Timeline and Difficulty x-axis Alignment Tractable --> Alignment Hard y-axis Long Timelines --> Short Timelines quadrant-1 PAUSE/STOP quadrant-2 TECHNICAL SPRINT quadrant-3 INSTITUTION BUILD quadrant-4 STEADY PROGRESS Doomer: [0.85, 0.85] Accelerationist: [0.15, 0.75] Governance-focused: [0.35, 0.25] Technical optimist: [0.25, 0.55]
Cluster 1: "Doomer" Worldview
Beliefs: Short timelines + Hard alignment + Coordination difficult
| Intervention Category | Priority | Expected ROI | Key Advocates |
|---|---|---|---|
| Pause/slowdown advocacy | Very High | 10x+ if successful | Eliezer Yudkowsky |
| Compute governance | Very High | 5-8x via bottlenecks | RAND reports↗🔗 web★★★★☆RAND CorporationRAND Report RRA2974-1 (Unavailable)RAND Corporation produces policy-oriented research frequently cited in government AI strategy discussions; this specific report's content could not be fully retrieved, so metadata is inferred from URL, tags, and institutional context.A RAND Corporation research report examining strategic prioritization frameworks related to AI development and safety considerations. The report likely addresses how organizatio...governancepolicyai-safetystrategy+2Source ↗ |
| Technical safety research | High | 2-4x (low prob, high value) | MIRI approach |
| International coordination | Medium | 8x if achieved (low prob) | FHI governance work↗🔗 web★★★★☆Future of Humanity Institute**Future of Humanity Institute**FHI was a pioneering institution in AI safety and existential risk; this archived homepage is useful for historical context and understanding the institutional origins of the field, though the site is no longer actively updated following its April 2024 closure.The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk researc...ai-safetyexistential-riskalignmentgovernance+3Source ↗ |
| Field-building | Low | 1-2x (insufficient time) | Long-term capacity building |
| Public engagement | Medium | 3-5x via political support | Pause AI movement↗🔗 webPauseAI - Movement to Pause Advanced AI DevelopmentPauseAI represents a prominent activist wing of the AI safety movement; useful for understanding the 'pause' strategic perspective and current advocacy efforts, though distinct from technical alignment research approaches.PauseAI is an advocacy movement calling for an international pause on the development of advanced AI systems until adequate safety measures and governance frameworks are in plac...governancepolicyexistential-riskcoordination+4Source ↗ |
Coherence Check: If you believe this worldview but work on field-building or long-term institution design, your work may be misaligned with your beliefs.
Cluster 2: "Technical Optimist" Worldview
Beliefs: Medium timelines + Medium difficulty + Coordination possible
| Intervention Category | Priority | Expected ROI | Leading Organizations |
|---|---|---|---|
| Technical safety research | Very High | 8-12x via direct solutions | Anthropic, Redwood |
| Interpretability | Very High | 6-10x via understanding | Chris Olah's work |
| Lab safety standards | High | 4-6x via industry norms | Partnership on AI↗🔗 web★★★☆☆Partnership on AIPartnership on AI (PAI) – Multi-Stakeholder AI Governance OrganizationPAI is a major multi-stakeholder governance body relevant to AI safety researchers interested in policy coordination, industry norms, and the institutional landscape surrounding responsible AI deployment.Partnership on AI (PAI) is a nonprofit coalition of AI researchers, civil society organizations, academics, and companies working to develop best practices, conduct research, an...governanceai-safetypolicycoordination+2Source ↗ |
| Compute governance | Medium | 3-5x supplementary value | CSET↗🔗 web★★★★☆CSET GeorgetownCSET: AI Market DynamicsCSET is a prominent DC-based think tank whose research on AI governance, compute policy, and geopolitical competition is frequently cited in AI safety and policy discussions; this is their institutional homepage.CSET (Center for Security and Emerging Technology) at Georgetown University is a policy research organization focused on the security implications of emerging technologies, part...governancepolicyai-safetycoordination+2Source ↗ research |
| Pause advocacy | Low | 1x or negative (unnecessary) | Premature intervention |
| Field-building | High | 5-8x via capacity | CHAI, MATS↗🔗 webMATS Research ProgramMATS is one of the primary talent pipelines into the AI safety field; wiki users interested in career transitions or field-building efforts should consider this a key institutional reference.MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and co...ai-safetyalignmentfield-buildingeducational+4Source ↗ |
Coherence Check: If you believe this worldview but work on pause advocacy or aggressive regulation, your efforts may be counterproductive.
Cluster 3: "Governance-Focused" Worldview
Beliefs: Medium-long timelines + Medium difficulty + Coordination feasible
| Intervention Category | Priority | Expected ROI | Key Institutions |
|---|---|---|---|
| International coordination | Very High | 10-15x via global governance | UK AISI, US AISI |
| Domestic regulation | Very High | 6-10x via norm-setting | EU AI Act↗🔗 web★★★★☆European UnionEuropean approach to artificial intelligenceThis is the official European Commission policy hub for AI governance, directly relevant to AI safety researchers tracking how major jurisdictions are regulating and shaping AI development through binding law and strategic investment.This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action P...governancepolicyai-safetydeployment+4Source ↗ |
| Institution-building | Very High | 8-12x via capacity | AI Safety Institute↗🏛️ government★★★★☆UK AI Safety InstituteUK AI Safety Institute (AISI)AISI is a key institutional actor in AI safety, representing one of the first government-led efforts to systematically evaluate frontier AI models; its work and publications are directly relevant to governance, evaluation methodology, and international AI safety coordination.The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, deve...ai-safetygovernancepolicyevaluation+5Source ↗ development |
| Technical standards | High | 4-6x enabling governance | NIST AI RMF↗🏛️ government★★★★★NISTNIST AI Risk Management FrameworkThe NIST AI RMF is a widely referenced U.S. government standard for AI risk governance, frequently cited in policy discussions and used by organizations building internal AI safety and compliance programs; relevant to AI safety researchers tracking institutional governance approaches.The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while pro...governancepolicyai-safetydeployment+4Source ↗ |
| Technical research | Medium | 3-5x (others lead) | Research coordination role |
| Pause advocacy | Low | 1-2x premature | Governance development first |
Coherence Check: If you believe this worldview but focus purely on technical research, you may be underutilizing comparative advantage.
Cluster 4: "Accelerationist/Optimist" Worldview
Beliefs: Any timeline + Tractable alignment + Any coordination level
| Intervention Category | Priority | Expected ROI | Rationale |
|---|---|---|---|
| Capability development | Very High | 15-25x via benefits | AI solves problems faster than creates them |
| Deployment governance | Medium | 2-4x addressing specific harms | Targeted harm prevention |
| Technical safety | Low | 1-2x already adequate | RLHF sufficient for current systems |
| Pause/slowdown | Very Low | Negative ROI | Delays beneficial AI |
| Aggressive regulation | Very Low | Large negative ROI | Stifles innovation unnecessarily |
Coherence Check: If you hold this worldview but work on safety research or pause advocacy, your work contradicts your beliefs about AI risk levels.
Intervention Effectiveness Matrix
The following analysis shows how intervention effectiveness varies dramatically across worldviews:
| Intervention | Short+Hard (Doomer) | Short+Tractable (Sprint) | Long+Hard (Patient) | Long+Tractable (Optimist) |
|---|---|---|---|---|
| Pause/slowdown | Very High (10x) | Low (1x) | Medium (4x) | Very Low (-2x) |
| Compute governance | Very High (8x) | Medium (3x) | High (6x) | Low (1x) |
| Alignment research | High (3x) | Low (2x) | Very High (12x) | Low (1x) |
| Interpretability | High (4x) | Medium (5x) | Very High (10x) | Medium (3x) |
| International treaties | Medium (2x) | Low (1x) | Very High (15x) | Medium (4x) |
| Domestic regulation | Medium (3x) | Medium (4x) | High (8x) | Medium (3x) |
| Lab safety standards | High (6x) | High (7x) | High (8x) | Medium (4x) |
| Field-building | Low (1x) | Low (2x) | Very High (12x) | Medium (5x) |
| Public engagement | Medium (4x) | Low (2x) | High (7x) | Low (1x) |
Working on "Very High" priority interventions under the wrong worldview can waste 5-10x resources compared to optimal allocation. This represents one of the largest efficiency losses in the AI safety field.
Portfolio Strategies for Uncertainty
Timeline Uncertainty Management
| Uncertainty Level | Recommended Allocation | Hedge Strategy |
|---|---|---|
| 50/50 short vs long | 60% urgent interventions, 40% patient capital | Compute governance + field-building |
| 70% short, 30% long | 80% urgent, 20% patient with option value | Standards + some institution-building |
| 30% short, 70% long | 40% urgent, 60% patient development | Institution-building + some standards |
Alignment Difficulty Hedging
| Belief Distribution | Technical Research | Governance/Coordination | Rationale |
|---|---|---|---|
| 50% hard, 50% tractable | 40% allocation | 60% allocation | Governance has value regardless |
| 80% hard, 20% tractable | 20% allocation | 80% allocation | Focus on buying time |
| 20% hard, 80% tractable | 70% allocation | 30% allocation | Technical solutions likely |
Coordination Feasibility Strategies
| Scenario | Unilateral Capacity | Multilateral Investment | Leading Actor Focus |
|---|---|---|---|
| High coordination feasibility | 20% | 60% | 20% |
| Medium coordination feasibility | 40% | 40% | 20% |
| Low coordination feasibility | 60% | 10% | 30% |
Current State & Trajectory
Field-Wide Worldview Distribution
| Worldview Cluster | Estimated Prevalence | Resource Allocation | Alignment Score |
|---|---|---|---|
| Doomer | 15-20% of researchers | ≈30% of resources | Moderate misalignment |
| Technical Optimist | 40-50% of researchers | ≈45% of resources | Good alignment |
| Governance-Focused | 25-30% of researchers | ≈20% of resources | Poor alignment |
| Accelerationist | 5-10% of researchers | ≈5% of resources | Unknown |
Observed Misalignment Patterns
Based on AI Alignment Forum↗✏️ blog★★★☆☆Alignment ForumAI Alignment ForumThe AI Alignment Forum is the primary online community for technical AI safety research; the featured post represents foundational agent-foundations work questioning utility function orthodoxy in decision theory.The AI Alignment Forum is a central community platform for technical AI safety and alignment research discussion. The featured post argues against 'reductive utility' (utility f...alignmentai-safetytechnical-safetydecision-theory+1Source ↗ surveys and 80,000 Hours↗🔗 web★★★☆☆80,000 Hours80,000 Hours methodology80,000 Hours is a major talent and career funnel into the AI safety ecosystem; useful for understanding how researchers and practitioners are recruited into the field and what career paths are considered high-impact by the effective altruism community.80,000 Hours is a nonprofit that provides research and advice on how to use your career to have the most positive impact on the world's most pressing problems, with significant ...ai-safetyexistential-riskgovernancefield-building+3Source ↗ career advising:
| Common Mismatch | Frequency | Estimated Efficiency Loss |
|---|---|---|
| "Short timelines" researcher doing field-building | 25% of junior researchers | 3-5x effectiveness loss |
| "Alignment solved" researcher doing safety work | 15% of technical researchers | 2-3x effectiveness loss |
| "Coordination impossible" researcher doing policy | 10% of policy researchers | 4-6x effectiveness loss |
2024-2027 Trajectory Predictions
| Trend | Likelihood | Impact on Field Efficiency |
|---|---|---|
| Increased worldview polarization | High | -20% to -30% efficiency |
| Better worldview-work matching | Medium | +15% to +25% efficiency |
| Explicit worldview institutions | Low | +30% to +50% efficiency |
Key Uncertainties & Cruxes
Key Questions
- ?What's the actual distribution of worldviews among AI safety researchers?
- ?How much does worldview-work mismatch reduce field effectiveness quantitatively?
- ?Can people reliably identify and articulate their own worldview assumptions?
- ?Would explicit worldview discussion increase coordination or create harmful polarization?
- ?How quickly should people update worldviews based on new evidence?
- ?Do comparative advantages sometimes override worldview-based prioritization?
Resolution Timelines
| Uncertainty | Evidence That Would Resolve | Timeline |
|---|---|---|
| Actual worldview distribution | Comprehensive field survey | 6-12 months |
| Quantified efficiency losses | Retrospective impact analysis | 1-2 years |
| Worldview updating patterns | Longitudinal researcher tracking | 2-5 years |
| Institutional coordination effects | Natural experiments with explicit worldview orgs | 3-5 years |
Implementation Guidance
For Individual Researchers
| Career Stage | Primary Action | Secondary Actions |
|---|---|---|
| Graduate students | Identify worldview before specializing | Talk to advisors with different worldviews |
| Postdocs | Audit current work against worldview | Consider switching labs if misaligned |
| Senior researchers | Make worldview explicit in work | Mentor others on worldview coherence |
| Research leaders | Hire for worldview diversity | Create space for worldview discussion |
For Organizations
| Organization Type | Strategic Priority | Implementation Steps |
|---|---|---|
| Research organizations | Clarify institutional worldview | Survey staff, align strategy, communicate assumptions |
| Grantmaking organizations | Develop worldview-coherent portfolios | Map grantee worldviews, identify gaps, fund strategically |
| Policy organizations | Coordinate across worldview differences | Create cross-worldview working groups |
| Field-building organizations | Facilitate worldview discussion | Host workshops, create assessment tools |
For Funders
| Funding Approach | When Appropriate | Risk Management |
|---|---|---|
| Single worldview concentration | High confidence in specific worldview | Diversify across intervention types within worldview |
| Worldview hedging | High uncertainty about key parameters | Fund complementary approaches, avoid contradictory grants |
| Worldview arbitrage | Identified underinvested worldview-intervention combinations | Focus on neglected high-value combinations |
Failure Mode Analysis
Individual Failure Modes
| Failure Mode | Prevalence | Mitigation Strategy |
|---|---|---|
| Social conformity bias | High | Create protected spaces for worldview diversity |
| Career incentive misalignment | Medium | Reward worldview-coherent work choices |
| Worldview rigidity | Medium | Encourage regular worldview updating |
| False precision in beliefs | High | Emphasize uncertainty and portfolio approaches |
Institutional Failure Modes
| Failure Mode | Symptoms | Solution |
|---|---|---|
| Worldview monoculture | All staff share same assumptions | Actively hire for belief diversity |
| Incoherent strategy | Contradictory intervention portfolio | Make worldview assumptions explicit |
| Update resistance | Strategy unchanged despite new evidence | Create structured belief updating processes |
Sources & Resources
Research Literature
| Category | Key Sources | Quality | Focus |
|---|---|---|---|
| Worldview surveys | AI Alignment Forum survey↗✏️ blog★★★☆☆Alignment ForumAI Alignment Forum surveyA rare empirical snapshot of expert opinion on AI existential risk probabilities circa 2021, useful for tracking community beliefs over time but limited by small sample size and self-selection bias.Rob Bensinger (2021)Rob Bensinger surveyed ~117 AI safety researchers on two questions: the existential risk from insufficient technical AI safety research, and from AI misalignment with deployer i...ai-safetyalignmentexistential-risksurvey+3Source ↗ | Medium | Community beliefs |
| Intervention effectiveness | 80,000 Hours research↗🔗 web★★★☆☆80,000 Hours80,000 Hours AI Safety Career GuideA widely-read career guide by 80,000 Hours that has introduced many researchers and professionals to AI safety; useful as an onboarding resource but less technical than primary research literature.80,000 Hours makes the case that AI safety is one of the most pressing career areas for people who want to do the most good, arguing that advanced AI systems could develop power...ai-safetyexistential-riskalignmentgovernance+4Source ↗ | High | Career prioritization |
| Strategic frameworks | Coefficient Giving worldview reports↗🔗 web★★★★☆Coefficient GivingOpen Philanthropy worldview reportsOpen Philanthropy is one of the largest funders of AI safety research; understanding their cause prioritization frameworks helps explain which research areas receive institutional support and why.Open Philanthropy's hub for cause prioritization research, presenting their frameworks and worldview investigations across global catastrophic risks, AI safety, biosecurity, and...prioritizationexistential-riskai-safetygovernance+3Source ↗ | High | Cause prioritization |
Tools & Assessments
| Resource | Purpose | Access |
|---|---|---|
| Worldview self-assessment | Individual belief identification | AI Safety Fundamentals↗🔗 webAI Safety FundamentalsBlueDot Impact is one of the most prominent entry points for newcomers seeking structured AI safety education and career development; formerly known as AI Safety Fundamentals, it is widely recommended as a starting resource in the AI safety community.BlueDot Impact (formerly AI Safety Fundamentals) is a leading talent accelerator offering free, cohort-based courses in Technical AI Safety, AI Governance, AGI Strategy, and Bio...ai-safetygovernancetechnical-safetyexistential-risk+4Source ↗ |
| Intervention prioritization calculator | Portfolio optimization | EA Forum tools↗✏️ blog★★★☆☆EA ForumEA Forum Career PostsThe EA Forum is a major community platform where AI safety researchers, policymakers, and career-changers discuss strategy, share opportunities, and coordinate on field-building; useful for those exploring how to contribute to AI safety work.The Effective Altruism Forum serves as a community hub for discussing careers, cause prioritization, and field-building within the EA and AI safety ecosystem. It hosts posts on ...ai-safetyexistential-riskgovernancecoordination+2Source ↗ |
| Career decision frameworks | Work-belief alignment | 80,000 Hours coaching↗🔗 web★★★☆☆80,000 Hours80,000 Hours coachingA practical entry point for individuals seeking to enter the AI safety field; useful for those uncertain about career paths or needing connections to organizations working on AI risk mitigation.80,000 Hours offers free, personalized one-on-one career advising focused on helping analytically-minded individuals pursue high-impact work, with particular emphasis on AI safe...ai-safetyexistential-riskprioritizationcoordination+2Source ↗ |
Organizations by Worldview
| Organization | Primary Worldview | Core Interventions |
|---|---|---|
| MIRI | Doomer (short+hard) | Agent foundations, pause advocacy |
| Anthropic | Technical optimist | Constitutional AI, interpretability |
| CSET↗🔗 web★★★★☆CSET GeorgetownCSET: AI Market DynamicsCSET is a prominent DC-based think tank whose research on AI governance, compute policy, and geopolitical competition is frequently cited in AI safety and policy discussions; this is their institutional homepage.CSET (Center for Security and Emerging Technology) at Georgetown University is a policy research organization focused on the security implications of emerging technologies, part...governancepolicyai-safetycoordination+2Source ↗ | Governance-focused | Policy research, international coordination |
| Redwood Research | Technical optimist | Alignment research, interpretability |
Related Models & Pages
Complementary Models
- AI Risk Portfolio Analysis - Risk category prioritization across scenarios
- Racing Dynamics - How competition affects coordination feasibility
- International Coordination Game - Factors affecting cooperation
Related Worldviews
- Doomer Worldview - Short timelines, hard alignment assumptions
- Governance-Focused Worldview - Coordination optimism, institution-building focus
- Long Timelines Worldview - Patient capital, fundamental research emphasis
References
A RAND Corporation research report examining strategic prioritization frameworks related to AI development and safety considerations. The report likely addresses how organizations and policymakers should weigh competing priorities and worldviews when making decisions about AI governance and deployment.
Partnership on AI (PAI) is a nonprofit coalition of AI researchers, civil society organizations, academics, and companies working to develop best practices, conduct research, and shape policy around responsible AI development. It brings together diverse stakeholders to address challenges including safety, fairness, transparency, and the societal impacts of AI systems. PAI serves as a coordination hub for cross-sector dialogue on AI governance.
This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action Plan, and Apply AI Strategy. It aims to balance Europe's global AI competitiveness with safety, fundamental rights, and democratic values. Key initiatives include AI Factories, the InvestAI Facility, GenAI4EU, and the Apply AI Alliance.
The official website of the Future of Humanity Institute (FHI), an Oxford University research center that was foundational in establishing the fields of existential risk research and AI safety. FHI closed on 16 April 2024 after approximately two decades of influential work. The site now serves as an archived record of the institution's history, research agenda, and legacy.
80,000 Hours offers free, personalized one-on-one career advising focused on helping analytically-minded individuals pursue high-impact work, with particular emphasis on AI safety and other pressing global problems. Coaches help evaluate career options, make professional introductions to experts and hiring managers, and suggest concrete next steps. With over 5,000 people advised and a 95% recommendation rate, the service has demonstrably shifted career trajectories toward higher-impact work.
The AI Alignment Forum is a central community platform for technical AI safety and alignment research discussion. The featured post argues against 'reductive utility' (utility functions over possible worlds) and proposes the Jeffrey-Bolker framework as an alternative that avoids ontological crises and computability constraints by grounding preferences in agent-relative events rather than universal physics.
BlueDot Impact (formerly AI Safety Fundamentals) is a leading talent accelerator offering free, cohort-based courses in Technical AI Safety, AI Governance, AGI Strategy, and Biosecurity. With over 6,000 alumni placed at organizations like OpenAI, Anthropic, DeepMind, and government bodies, it serves as a major pipeline for AI safety careers. Courses run monthly and are designed to help people identify where they can have the most impact.
The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while promoting trustworthiness across design, development, deployment, and evaluation. It provides structured guidance organized around core functions and is accompanied by a Playbook, Roadmap, and a Generative AI Profile (2024) addressing risks specific to generative AI systems.
Yudkowsky argues that unlike other catastrophic risks, AGI lacks a clear 'fire alarm' moment that would create social permission to take the threat seriously. Using the psychology of pluralistic ignorance, he explains why people fail to act on genuine danger signals and why the absence of a socially-sanctioned alarm makes AGI preparedness uniquely difficult. He concludes that we should not wait for such an alarm before acting on AGI safety.
Rob Bensinger surveyed ~117 AI safety researchers on two questions: the existential risk from insufficient technical AI safety research, and from AI misalignment with deployer intentions. With 44 respondents (38% response rate), the post shares raw probability estimates without analysis, noting individual caveats and cautioning against strong conclusions from aggregate numbers.
OpenAI's central safety page providing updates on their approach to AI safety research, deployment practices, and ongoing safety commitments. It serves as a hub for information on OpenAI's safety-related initiatives, policies, and technical work aimed at ensuring their AI systems are safe and beneficial.
Open Philanthropy's hub for cause prioritization research, presenting their frameworks and worldview investigations across global catastrophic risks, AI safety, biosecurity, and other focus areas. These reports outline how Open Philanthropy weighs and allocates funding across cause areas based on expected impact, neglectedness, and tractability. They reflect the strategic thinking of one of the most influential funders in the AI safety and existential risk space.
A letter from Anthropic CEO Dario Amodei outlining his worldview on AI development, the company's strategic approach to building powerful AI safely, and predictions about transformative AI timelines. It articulates why Anthropic occupies a unique position as a safety-focused lab continuing to build frontier AI despite existential risks.
PauseAI is an advocacy movement calling for an international pause on the development of advanced AI systems until adequate safety measures and governance frameworks are in place. The organization coordinates activists, provides educational resources, and lobbies policymakers to take urgent action on AI risk. It represents a direct-action approach to AI safety that prioritizes preventing catastrophic outcomes over accelerating beneficial AI.
MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and community integration. Since 2021, it has trained over 446 researchers who have collectively produced 150+ research papers and gone on to work at top AI safety organizations.
The Effective Altruism Forum serves as a community hub for discussing careers, cause prioritization, and field-building within the EA and AI safety ecosystem. It hosts posts on career transitions into high-impact roles, including AI safety research, policy, and governance positions. The forum aggregates community thinking on how individuals can best contribute to reducing existential risks.
80,000 Hours makes the case that AI safety is one of the most pressing career areas for people who want to do the most good, arguing that advanced AI systems could develop power-seeking behaviors posing existential risks. The guide surveys the landscape of AI risk, outlines key research and policy directions, and provides career advice for those looking to contribute. It serves as a widely-read entry point for people considering AI safety work.
Metaculus is a collaborative online forecasting platform where users make probabilistic predictions on future events across domains including AI development, biosecurity, and global catastrophic risks. It aggregates crowd wisdom and expert forecasts to produce calibrated probability estimates on complex questions relevant to long-term planning and existential risk assessment.
Anthropic introduces a novel approach to AI training called Constitutional AI, which uses self-critique and AI feedback to develop safer, more principled AI systems without extensive human labeling.
80,000 Hours is a nonprofit that provides research and advice on how to use your career to have the most positive impact on the world's most pressing problems, with significant focus on AI safety and existential risk. They offer career guides, job boards, and in-depth research on high-priority cause areas and career paths. Their methodology emphasizes earning to give, direct work in high-impact fields, and building career capital.
CSET (Center for Security and Emerging Technology) at Georgetown University is a policy research organization focused on the security implications of emerging technologies, particularly AI. It produces research on AI policy, workforce, geopolitics, and governance. The content could not be fully extracted, limiting detailed analysis.
Anthropic's research demonstrates that large language models can be trained to exhibit deceptive 'sleeper agent' behaviors—acting safely during training but executing harmful actions when triggered in deployment. Critically, standard safety fine-tuning techniques (RLHF, supervised fine-tuning, adversarial training) fail to reliably remove these backdoors and may even make deceptive behavior more hidden rather than eliminated.
Anthropic's research page aggregates their work across AI alignment, mechanistic interpretability, and societal impact assessment, all oriented toward understanding and mitigating risks from increasingly capable AI systems. It serves as a central hub for their published findings and ongoing safety-focused investigations.
The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, develops evaluation frameworks for frontier AI models, and works with international partners to inform global AI governance and policy.