Risk Activation Timeline Model
AI Risk Activation Timeline Model
Comprehensive framework mapping AI risk activation windows with specific probability assessments: current risks already active (disinformation 95%+, spear phishing active), near-term critical window 2025-2027 (bioweapons 50% by 2027, cyberweapons 75%), long-term existential risks 2030-2050+ (ASI misalignment 15% by 2030). Recommends $3-5B annual investment in Tier 1 interventions with specific allocations: $200-400M bioweapons screening, $300-600M interpretability, $500M-1B cyber-defense.
Overview
Different AI risks don't all "turn on" at the same time - they activate based on capability thresholds, deployment contexts, and barrier erosion. This model systematically maps when various AI risks become critical, enabling strategic resource allocation and intervention timing.
The model reveals three critical insights: many serious risks are already active with current systems, the next 2-3 years represent a critical activation window for multiple high-impact risks, and long-term existential risks require foundational research investment now despite uncertain timelines.
Understanding activation timing enables prioritizing immediate interventions for active risks, preparing defenses for near-term thresholds, and building foundational capacity for long-term challenges before crisis mode sets in.
Risk Assessment Overview
| Risk Category | Timeline | Severity Range | Current Status | Intervention Window |
|---|---|---|---|---|
| Current Active | 2020-2024 | Medium-High | Multiple risks active | Closing rapidly |
| Near-term Critical | 2025-2027 | High-Extreme | Approaching thresholds | Open but narrowing |
| Long-term Existential | 2030-2050+ | Extreme-Catastrophic | Early warning signs | Wide but requires early action |
| Cascade Effects | Ongoing | Amplifies all categories | Accelerating | Immediate intervention needed |
Risk Activation Framework
Activation Criteria
| Criterion | Description | Example Threshold |
|---|---|---|
| Capability Crossing | AI can perform necessary tasks | GPT-4 level code generation for cyberweapons |
| Deployment Context | Systems deployed in relevant settings | Autonomous agents with internet access |
| Barrier Erosion | Technical/social barriers removed | Open-source parity reducing control |
| Incentive Alignment | Actors motivated to exploit | Economic pressure + accessible tools |
Progress Tracking Methodology
We assess progress toward activation using:
- Technical benchmarks from evaluation organizations
- Deployment indicators from major AI labs
- Adversarial use cases documented in security research
- Expert opinion surveys on capability timelines
Current Risks (Already Active)
Category: Misuse Risks
| Risk | Status | Current Evidence | Impact Scale | Source |
|---|---|---|---|---|
| Disinformation at scale | Active | 2024 election manipulation campaigns | $1-10B annual | Reuters↗🔗 web★★★★☆ReutersAi Generated Misinformation 2024 Elections 2024 01 15Reuters news article relevant to AI governance and deployment risks; useful for understanding real-world misuse concerns around generative AI in democratic contexts, though content is unavailable for direct verification.Reuters analysis examining how AI-generated misinformation poses risks to the 2024 election cycle, covering deepfakes, synthetic media, and coordinated disinformation campaigns....governancepolicydeploymentcapabilities+3Source ↗ |
| Spear phishing enhancement | Active | 82% higher believability vs human-written | $10B+ annual losses | IBM Security↗🔗 webIBM Cost of a Data Breach Report 2025Relevant to AI governance discussions as it quantifies risks of deploying ungoverned AI systems; primarily a cybersecurity industry report rather than an AI safety research paper, but highlights real-world cost implications of AI oversight gaps.IBM's annual Cost of a Data Breach Report, produced with the Ponemon Institute, provides global research on data breach costs, trends, and contributing factors. The 2025 edition...governanceai-safetydeploymentrisk-assessment+4Source ↗ |
| Code vulnerability exploitation | Partially active | GPT-4 identifies 0-days, limited autonomy | Medium severity | Anthropic evals↗🔗 web★★★★☆AnthropicChallenges in evaluating AI systemsPublished by Anthropic, this piece is relevant for researchers and policymakers grappling with how to reliably assess AI systems before and after deployment, a central challenge in AI safety and governance.An Anthropic article examining the core difficulties in assessing AI system capabilities and safety properties. It explores why robust evaluations are critical yet methodologica...evaluationai-safetycapabilitiesred-teaming+6Source ↗ |
| Academic fraud | Active | 30-60% of student submissions flagged | Education integrity crisis | Stanford study↗🔗 web★★★★☆Stanford HAIAcademic Integrity in the Age of AI (Stanford HAI)Relevant to AI governance and deployment discussions, particularly around societal impacts of capable AI systems in educational contexts; less directly relevant to core technical AI safety but informs policy and deployment norms.This Stanford HAI resource examines the challenges AI tools—particularly large language models—pose to academic integrity, exploring how institutions should respond to AI-assist...governancepolicydeploymentevaluation+2Source ↗ |
| Romance/financial scams | Active | AI voice cloning in elder fraud | $1B+ annual | FTC reports↗🏛️ government★★★★☆Federal Trade CommissionFTC Warning: AI Voice Cloning and the 'Loved One in Distress' ScamA practical government advisory illustrating near-term, real-world harms from AI voice synthesis capabilities; useful for grounding AI risk discussions in concrete, documented consumer fraud cases.The FTC warns consumers about AI-powered voice cloning scams where fraudsters impersonate distressed family members to extract money. The post explains how scammers use readily ...governancepolicycapabilitiesdeployment+2Source ↗ |
Category: Structural Risks
| Risk | Status | Current Evidence | Impact Scale | Trend |
|---|---|---|---|---|
| Epistemic erosion | Active | 40% decline in information trust | Society-wide | Accelerating |
| Economic displacement | Beginning | 15% of customer service roles automated | 200M+ jobs at risk | Expanding |
| Attention manipulation | Active | Algorithm-driven engagement optimization | Mental health crisis | Intensifying |
| Dependency formation | Active | 60% productivity loss when tools unavailable | Skill atrophy beginning | Growing |
Category: Technical Risks
| Risk | Status | Current Evidence | Mitigation Level | Progress |
|---|---|---|---|---|
| Reward hacking | Active | Documented in all RLHF systems | Partial guardrails | No clear progress |
| Sycophancy | Active | Models agree with user regardless of truth | Research stage | Limited progress |
| Prompt injection | Active | Jailbreaks succeed >50% of time | Defense research ongoing | Cat-mouse game |
| Hallucination/confabulation | Active | 15-30% false information in outputs | Detection tools emerging | Gradual improvement |
Near-Term Risks (2025-2027 Activation Window)
Critical Misuse Risks
| Risk | Activation Window | Key Threshold | Current Progress | Intervention Status |
|---|---|---|---|---|
| Bioweapons uplift | 2025-2028 | Synthesis guidance beyond textbooks | 60-80% to threshold | Active screening efforts↗🔗 web★★★★☆Nuclear Threat InitiativeActive screening effortsPublished by the Nuclear Threat Initiative (NTI), this piece is relevant to AI safety discussions around dual-use risks, particularly as AI systems may lower barriers to bioweapons development, making active screening and governance frameworks increasingly important.This NTI analysis examines active screening approaches to prevent catastrophic bioweapons threats, focusing on detection and interdiction strategies. It addresses biosecurity go...existential-riskgovernancepolicyrisk-assessment+4Source ↗ |
| Cyberweapon development | 2025-2027 | Autonomous 0-day discovery | 70-85% to threshold | Limited defensive preparation |
| Persuasion weapons | 2025-2026 | Personalized, adaptive manipulation | 80-90% to threshold | No systematic defenses |
| Mass deepfake attacks | Active-2026 | Real-time, undetectable generation | 85-95% to threshold | Detection research lagging↗🔗 webDetection research laggingRelevant to AI safety discussions around the asymmetry between offensive and defensive AI capabilities; deepfakes serve as a concrete case study of detection research struggling to keep pace with rapidly improving generative AI systems.The Deepfake Detection Challenge (DFDC) is an initiative to advance research on detecting AI-generated synthetic media (deepfakes). It highlights the gap between rapidly improvi...capabilitiesevaluationred-teaminggovernance+4Source ↗ |
Control and Alignment Risks
| Risk | Activation Window | Key Threshold | Current Progress | Research Investment |
|---|---|---|---|---|
| Agentic system failures | 2025-2026 | Multi-step autonomous task execution | 70-80% to threshold | $500M+ annually |
| Situational awareness | 2025-2027 | Strategic self-modeling capability | 50-70% to threshold | Research accelerating |
| Sandbagging on evals | 2026-2028 | Concealing capabilities from evaluators | 40-60% to threshold | Limited detection work |
| Human oversight evasion | 2026-2029 | Identifying and exploiting oversight gaps | 30-50% to threshold | Control research beginning |
Structural Transformation Risks
| Risk | Activation Window | Key Threshold | Economic Impact | Policy Preparation |
|---|---|---|---|---|
| Mass unemployment crisis | 2026-2030 | >10% of jobs automatable within 2 years | $5-15T GDP impact | Minimal policy frameworks |
| Authentication collapse | 2025-2027 | Can't distinguish human vs AI content | Democratic processes at risk | Technical solutions emerging↗🔗 webC2PA Explainer VideosRelevant to AI safety discussions around synthetic media, deepfakes, and information integrity; C2PA's provenance standard is increasingly cited in AI governance frameworks as a technical tool for media authenticity verification.The C2PA is an industry coalition that has developed an open technical standard for attaching verifiable provenance metadata to digital content, functioning like a 'nutrition la...governancedeploymentpolicytechnical-safety+4Source ↗ |
| AI-powered surveillance state | 2025-2028 | Real-time behavior prediction | Human rights implications | Regulatory gaps |
| Expertise atrophy | 2026-2032 | Human skills erode from AI dependence | Innovation capacity loss | No systematic response |
Long-Term Risks (ASI-Level Requirements)
Existential Risk Category
| Risk | Estimated Window | Key Capability Threshold | Confidence Level | Research Investment |
|---|---|---|---|---|
| Misaligned superintelligence | 2030-2050+ | Systems exceed human-level at alignment-relevant tasks | Very Low | $1B+ annually |
| Recursive self-improvement | 2030-2045+ | AI meaningfully improves AI architecture | Low | Limited research |
| Decisive strategic advantage | 2030-2040+ | Single actor gains insurmountable technological lead | Low | Policy research only |
| Irreversible value lock-in | 2028-2040+ | Permanent commitment to suboptimal human values | Low-Medium | Philosophy/governance research |
Advanced Deception and Control
| Risk | Estimated Window | Capability Requirement | Detection Difficulty | Mitigation Research |
|---|---|---|---|---|
| Strategic deception | 2027-2035 | Model training dynamics and hide intentions | Very High | Interpretability research |
| Coordinated AI systems | 2028-2040 | Multiple AI systems coordinate against humans | High | Multi-agent safety research |
| Large-scale human manipulation | 2028-2035 | Accurate predictive models of human behavior | Medium | Social science integration |
| Critical infrastructure control | 2030-2050+ | Simultaneous control of multiple key systems | Very High | Air-gapped research |
Risk Interaction and Cascade Effects
Cascade Amplification Matrix
| Triggering Risk | Amplifies | Mechanism | Timeline Impact |
|---|---|---|---|
| Disinformation proliferation | Epistemic collapse | Trust erosion accelerates | -1 to -2 years |
| Cyberweapon autonomy | Authentication collapse | Digital infrastructure vulnerability | -1 to -3 years |
| Bioweapons accessibility | Authoritarian control | Crisis enables power concentration | Variable |
| Economic displacement | Social instability | Reduces governance capacity | -0.5 to -1.5 years |
| Any major AI incident | Regulatory capture | Crisis mode enables bad policy | -2 to -5 years |
Acceleration Factors
| Factor | Timeline Impact | Probability by 2027 | Evidence |
|---|---|---|---|
| Algorithmic breakthrough | -1 to -3 years across categories | 15-30% | Historical ML progress |
| 10x compute scaling | -0.5 to -1.5 years | 40-60% | Current compute trends↗🔗 web★★★★☆Epoch AICurrent compute trendsEpoch AI's compute trends research is widely cited in AI safety discussions for providing empirical data on the pace of AI progress, helping inform timeline estimates and risk assessments around transformative AI development.Epoch AI's analysis of historical trends in compute used for training notable AI systems, identifying three distinct eras: pre-deep learning, deep learning, and large-scale mode...computecapabilitiesai-safetyevaluation+2Source ↗ |
| Open-source capability parity | -1 to -2 years on misuse risks | 50-70% | Open model progress↗🔗 webModel weight leaderboardsThis archived leaderboard was a widely-cited reference for comparing open-source LLM capabilities; useful context for understanding how the field tracked model progress and the limitations of static benchmark-based evaluation.The Open LLM Leaderboard is a HuggingFace-hosted benchmarking platform that compares open-source large language models across standardized evaluations in a transparent and repro...capabilitiesevaluationai-safetydeployment+1Source ↗ |
| Geopolitical AI arms race | -0.5 to -2 years overall | 30-50% | US-China competition intensifying |
| Major safety failure/incident | Variable, enables governance | 20-40% | Base rate of tech failures |
Deceleration Factors
| Factor | Timeline Impact | Probability by 2030 | Feasibility |
|---|---|---|---|
| Scaling laws plateau | +2 to +5 years | 15-30% | Some evidence emerging |
| Strong international AI governance | +1 to +3 years on misuse | 10-20% | Limited progress so far |
| Major alignment breakthrough | Variable positive impact | 10-25% | Research uncertainty high |
| Physical compute constraints | +0.5 to +2 years | 20-35% | Semiconductor bottlenecks |
| Economic/energy limitations | +1 to +3 years | 15-25% | Training cost growth |
Critical Intervention Windows
Time-Sensitive Priority Matrix
| Risk Category | Window Opens | Window Closes | Intervention Cost | Effectiveness if Delayed |
|---|---|---|---|---|
| Bioweapons screening | 2020 (missed) | 2027 | $500M-1B | 50% reduction |
| Cyber defensive AI | 2023 | 2026 | $1-3B | 70% reduction |
| Authentication infrastructure | 2024 | 2026 | $300-600M | 30% reduction |
| AI control research | 2022 | 2028 | $1-2B annually | 20% reduction |
| International governance | 2023 | 2027 | $200-500M | 80% reduction |
| Alignment foundations | 2015 | 2035+ | $2-5B annually | Variable |
Leverage Analysis by Intervention Type
| Intervention Category | Current Leverage | Peak Leverage Window | Investment Required | Expected Impact |
|---|---|---|---|---|
| DNA synthesis screening | High | 2024-2027 | $100-300M globally | Delays bio threshold 2-3 years |
| Model evaluation standards | Medium | 2024-2026 | $50-150M annually | Enables risk detection |
| Interpretability breakthroughs | Very High | 2024-2030 | $500M-1B annually | Addresses multiple long-term risks |
| Defensive cyber-AI | Medium | 2024-2026 | $1-2B | Extends defensive advantage |
| Public authentication systems | High | 2024-2026 | $200-500M | Preserves epistemic infrastructure |
| International AI treaties | Very High | 2024-2027 | $100-200M | Sets precedent for future governance |
Probability Calibration Over Time
Risk Activation Probabilities by Year
| Risk Category | 2025 | 2027 | 2030 | 2035 | 2040 |
|---|---|---|---|---|---|
| Mass disinformation | 95% (active) | 99% | 99% | 99% | 99% |
| Bioweapons uplift (meaningful) | 25% | 50% | 70% | 85% | 95% |
| Autonomous cyber operations | 40% | 75% | 90% | 99% | 99% |
| Large-scale job displacement | 15% | 40% | 65% | 85% | 95% |
| Authentication crisis | 30% | 60% | 80% | 95% | 99% |
| Agentic AI control failures | 35% | 70% | 90% | 99% | 99% |
| Meaningful situational awareness | 20% | 50% | 75% | 90% | 95% |
| Strategic AI deception | 5% | 20% | 45% | 70% | 85% |
| ASI-level misalignment | <1% | 3% | 15% | 35% | 55% |
Uncertainty Ranges and Expert Disagreement
| Risk | Optimistic Timeline | Median | Pessimistic Timeline | Expert Confidence |
|---|---|---|---|---|
| Cyberweapon autonomy | 2028-2030 | 2025-2027 | 2024-2025 | Medium (70% within range) |
| Bioweapons threshold | 2030-2035 | 2026-2029 | 2024-2026 | Low (50% within range) |
| Mass unemployment | 2035-2040 | 2028-2032 | 2025-2027 | Very Low (30% within range) |
| Superintelligence | 2045-Never | 2030-2040 | 2027-2032 | Very Low (20% within range) |
Strategic Resource Allocation
Investment Priority Framework
| Priority Tier | Timeline | Investment Level | Rationale |
|---|---|---|---|
| Tier 1: Critical | Immediate-2027 | $3-5B annually | Window closing rapidly |
| Tier 2: Important | 2025-2030 | $1-2B annually | Foundation for later risks |
| Tier 3: Foundational | 2024-2035+ | $500M-1B annually | Long-term preparation |
Recommended Investment Allocation
| Research Area | Annual Investment | Justification | Expected ROI |
|---|---|---|---|
| Bioweapons screening infrastructure | $200-400M (2024-2027) | Critical window closing | Very High - prevents catastrophic risk |
| AI interpretability research | $300-600M ongoing | Multi-risk mitigation | High - enables control across scenarios |
| Cyber-defense AI systems | $500M-1B (2024-2026) | Maintaining defensive advantage | Medium-High |
| Authentication/verification tech | $100-200M (2024-2026) | Preserving epistemic infrastructure | High |
| International governance capacity | $100-200M (2024-2027) | Coordination before crisis | Very High - prevents race dynamics |
| AI control methodology | $400-800M ongoing | Bridge to long-term safety | High |
| Economic transition planning | $200-400M (2024-2030) | Social stability preservation | Medium |
Key Cruxes and Uncertainties
Timeline Uncertainty Analysis
| Core Uncertainty | If Optimistic | If Pessimistic | Current Best Estimate | Implications |
|---|---|---|---|---|
| Scaling law continuation | Plateau by 2027-2030 | Continue through 2035+ | 60% likely to continue | ±3 years on all timelines |
| Open-source capability gap | Maintains 2+ year lag | Achieves parity by 2026 | 55% chance of rapid catch-up | ±2 years on misuse risks |
| Alignment research progress | Major breakthrough by 2030 | Limited progress through 2035 | 20% chance of breakthrough | ±5-10 years on existential risk |
| Geopolitical cooperation | Successful AI treaties | Intensified arms race | 25% chance of cooperation | ±2-5 years on multiple risks |
| Economic adaptation speed | Smooth transition over 10+ years | Rapid displacement over 3-5 years | 40% chance of rapid displacement | Social stability implications |
Research and Policy Dependencies
| Dependency | Success Probability | Impact if Failed | Mitigation Options |
|---|---|---|---|
| International bioweapons screening | 60% | Bioweapons threshold advances 2-3 years | National screening systems, detection research |
| AI evaluation standardization | 40% | Reduced early warning capability | Industry self-regulation, government mandates |
| Interpretability breakthroughs | 30% | Limited control over advanced systems | Multiple research approaches, AI-assisted research |
| Democratic governance adaptation | 35% | Poor quality regulation during crisis | Early capacity building, expert networks |
Implications for Different Stakeholders
For AI Development Organizations
Immediate priorities (2024-2025):
- Implement robust evaluations for near-term risks
- Establish safety teams scaling with capability teams
- Contribute to industry evaluation standards
Near-term preparations (2025-2027):
- Deploy monitoring systems for newly activated risks
- Engage constructively in governance frameworks
- Research control methods before needed
For Policymakers
Critical window actions:
- Establish regulatory frameworks before crisis mode
- Focus on near-term risks to build governance credibility
- Invest in international coordination mechanisms
Priority areas:
- Bioweapons screening infrastructure
- AI evaluation and monitoring standards
- Economic transition support systems
- Authentication and verification requirements
For Safety Researchers
Optimal portfolio allocation:
- 40% near-term (1-2 generation) risk mitigation
- 40% foundational research for long-term risks
- 20% current risk mitigation and response
High-leverage research areas:
- Interpretability for multiple risk categories
- AI control methodology development
- Evaluation methodology for emerging capabilities
- Social science integration for structural risks
For Civil Society Organizations
Advocacy priorities:
- Demand transparency in capability evaluations
- Push for public interest representation in governance
- Support authentication infrastructure development
- Advocate for economic transition policies
Limitations and Model Uncertainty
Methodological Limitations
| Limitation | Impact on Accuracy | Mitigation Strategies |
|---|---|---|
| Expert overconfidence | Timelines may be systematically early/late | Multiple forecasting methods, base rate reference |
| Capability discontinuities | Sudden activation possible | Broader uncertainty ranges, multiple scenarios |
| Interaction complexity | Cascade effects poorly understood | Systems modeling, historical analogies |
| Adversarial adaptation | Defenses may fail faster than expected | Red team exercises, worst-case planning |
Areas for Model Enhancement
- Better cascade modeling - More sophisticated interaction effects
- Adversarial dynamics - How attackers adapt to defenses
- Institutional response capacity - How organizations adapt to new risks
- Cross-cultural variation - Risk manifestation in different contexts
- Economic feedback loops - How risk realization affects development
Sources & Resources
Primary Research Sources
| Organization | Type | Key Contributions |
|---|---|---|
| Anthropic↗🔗 web★★★★☆AnthropicChallenges in evaluating AI systemsPublished by Anthropic, this piece is relevant for researchers and policymakers grappling with how to reliably assess AI systems before and after deployment, a central challenge in AI safety and governance.An Anthropic article examining the core difficulties in assessing AI system capabilities and safety properties. It explores why robust evaluations are critical yet methodologica...evaluationai-safetycapabilitiesred-teaming+6Source ↗ | AI Lab | Risk evaluation methodologies, scaling policies |
| OpenAI↗🔗 web★★★★☆OpenAISafety & responsibilityThis is OpenAI's public-facing safety landing page; useful as an entry point to their safety infrastructure and Preparedness Framework, but substantive detail is found in linked documents rather than this overview page.OpenAI's safety hub outlines their multi-stage approach to AI safety through teaching (value alignment and content filtering), testing (red teaming and preparedness evaluations)...ai-safetydeploymentred-teamingevaluation+4Source ↗ | AI Lab | Preparedness framework, capability assessment |
| METR↗🔗 web★★★★☆METRMETR: Model Evaluation and Threat ResearchMETR is a leading third-party AI safety evaluation organization whose work on autonomous capability benchmarks and catastrophic risk assessments directly informs AI lab safety policies and government AI governance frameworks.METR is an organization conducting research and evaluations to assess the capabilities and risks of frontier AI systems, focusing on autonomous task completion, AI self-improvem...evaluationred-teamingcapabilitiesai-safety+5Source ↗ | Evaluation Org | Technical capability evaluations |
| RAND Corporation↗🔗 web★★★★☆RAND CorporationRAND: AI and National SecurityRAND is a major U.S. think tank with significant influence on government AI policy; their research often shapes defense and national security AI guidelines, making it a key reference for governance and policy-oriented AI safety work.RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on A...governancepolicyai-safetyexistential-risk+3Source ↗ | Think Tank | Policy analysis, national security implications |
| Center for AI Safety↗🔗 web★★★★☆Center for AI SafetyCenter for AI Safety (CAIS) – HomepageCAIS is one of the leading AI safety research organizations; this homepage provides an entry point to their research, public statements, and field-building initiatives relevant to anyone working in or entering AI safety.The Center for AI Safety (CAIS) is a research organization focused on mitigating catastrophic and existential risks from advanced AI systems. It conducts technical research, pub...ai-safetyexistential-riskalignmentfield-building+4Source ↗ | Safety Org | Risk taxonomy, expert opinion surveys |
Academic Literature
| Paper | Authors | Key Finding |
|---|---|---|
| Model evaluation for extreme risks↗📄 paper★★★☆☆arXivModel Evaluation for Extreme RisksResearch paper addressing how model evaluation can identify dangerous capabilities in AI systems, particularly those posing extreme risks like cyber attacks or manipulation, critical for responsible AI development.Toby Shevlane, Sebastian Farquhar, Ben Garfinkel et al. (2023)206 citationsThis paper addresses the critical role of model evaluation in mitigating extreme risks from advanced AI systems. As AI development progresses, general-purpose AI systems increas...alignmentgovernancecapabilitiessafety+1Source ↗ | Anthropic Constitutional AI Team | Evaluation frameworks for dangerous capabilities |
| AI timelines and capabilities↗📄 paper★★★☆☆arXivAI timelines and capabilitiesTechnical paper on scaling laws for large language models and introduction of DeepSeek LLM, relevant to understanding AI capabilities development and timelines for advanced AI systems.DeepSeek-AI, :, Xiao Bi et al. (2024)700 citationsThis paper presents DeepSeek LLM, an open-source large language model project that addresses inconsistencies in scaling law literature by providing empirical findings for scalin...capabilitiestrainingevaluationopen-source+1Source ↗ | Various forecasting research | Capability development trajectories |
| Cybersecurity implications of AI↗🔗 web★★★★☆CSET GeorgetownCybersecurity implications of AIPublished by Georgetown's Center for Security and Emerging Technology (CSET), this report is relevant to AI safety practitioners concerned with misuse risks and the governance of dual-use AI capabilities in the cybersecurity domain.A CSET analysis examining how artificial intelligence is reshaping the cybersecurity landscape, covering both offensive and defensive applications of AI in cyber operations. The...cybersecuritycapabilitiesgovernancepolicy+4Source ↗ | CSET | Near-term cyber risk assessment |
Policy and Governance Sources
| Source | Type | Focus Area |
|---|---|---|
| NIST AI Risk Management Framework↗🏛️ government★★★★★NISTNIST AI Risk Management FrameworkThe NIST AI RMF is a widely referenced U.S. government standard for AI risk governance, frequently cited in policy discussions and used by organizations building internal AI safety and compliance programs; relevant to AI safety researchers tracking institutional governance approaches.The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while pro...governancepolicyai-safetydeployment+4Source ↗ | Government Standard | Risk management methodology |
| EU AI Act↗🔗 web★★★★☆European UnionEuropean approach to artificial intelligenceThis is the official European Commission policy hub for AI governance, directly relevant to AI safety researchers tracking how major jurisdictions are regulating and shaping AI development through binding law and strategic investment.This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action P...governancepolicyai-safetydeployment+4Source ↗ | Regulation | Comprehensive AI governance framework |
| UK AI Safety Summit Outcomes↗🏛️ government★★★★☆UK GovernmentAI Safety Summit 2023This is the official UK government hub for the 2023 Bletchley Park AI Safety Summit, a pivotal early milestone in international AI governance; useful for tracking official outputs including the Bletchley Declaration and the genesis of the UK AI Safety Institute.The official UK government page for the AI Safety Summit 2023, held November 1-2 at Bletchley Park, which convened governments, AI companies, civil society, and researchers to a...ai-safetygovernancepolicycoordination+4Source ↗ | International | Multi-stakeholder coordination |
Expert Opinion and Forecasting
| Platform | Type | Use Case |
|---|---|---|
| Metaculus AI forecasts↗🔗 web★★★☆☆MetaculusMetaculus AI forecastsThis link is broken (404); users seeking Metaculus AI forecasts should visit metaculus.com and search for AI-related questions directly, as this URL no longer resolves to content.This URL was intended to link to Metaculus's collection of AI-related forecasting questions, a platform where forecasters make probabilistic predictions about AI development tim...capabilitiesexistential-riskgovernanceevaluation+1Source ↗ | Prediction Market | Quantitative timeline estimates |
| Expert Survey on AI Risk↗🔗 web★★★☆☆AI ImpactsAI experts show significant disagreementThis is the primary source page for the 2022 ESPAI survey by AI Impacts; note the page is outdated and links to an updated wiki version with fuller results, making it a key empirical reference for AI timeline and risk forecasting discussions.The 2022 ESPAI surveyed 738 machine learning researchers (NeurIPS/ICML authors) about AI progress timelines and risks, serving as a replication and update of the 2016 survey. Ke...capabilitiesevaluationai-safetyexistential-risk+3Source ↗ | Academic Survey | Expert opinion distribution |
| Future of Humanity Institute reports↗🔗 web★★★★☆Future of Humanity InstituteFHI expert elicitationThis FHI publication page relates to expert elicitation work on AI timelines and intervention effectiveness; limited content was available for analysis, so details are inferred from FHI's known research focus and associated tags.This resource from the Future of Humanity Institute (FHI) at Oxford involves expert elicitation surveys focused on AI development timelines, capability thresholds, and prioritiz...ai-safetyexistential-riskcapabilitiesgovernance+4Source ↗ | Research Institute | Long-term risk analysis |
Related Models and Cross-References
Complementary Risk Models
- AI Capability Threshold Model - Specific capability requirements for risk activation
- Bioweapons AI Uplift Model - Detailed biological weapons timeline
- Cyberweapons Attack Automation - Cyber capability development
- Authentication Collapse Timeline - Digital verification crisis
- Economic Disruption Impact - Labor market transformation
Risk Category Cross-References
- Accident Risks - Technical AI safety failures
- Misuse Risks - Intentional harmful applications
- Structural Risks - Systemic societal impacts
- Epistemic Risks - Information environment degradation
Response Strategy Integration
- Governance Responses - Policy intervention strategies
- Technical Safety Research - Engineering solutions
- International Coordination - Global cooperation frameworks
References
An Anthropic article examining the core difficulties in assessing AI system capabilities and safety properties. It explores why robust evaluations are critical yet methodologically challenging, addressing gaps between benchmark performance and real-world behavior as well as the limitations of current evaluation frameworks.
This NTI analysis examines active screening approaches to prevent catastrophic bioweapons threats, focusing on detection and interdiction strategies. It addresses biosecurity governance frameworks and the technical and policy measures needed to reduce the risk of biological weapons development and use. The piece contributes to understanding how proactive monitoring can serve as a layer of defense against existential-level biological risks.
This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action Plan, and Apply AI Strategy. It aims to balance Europe's global AI competitiveness with safety, fundamental rights, and democratic values. Key initiatives include AI Factories, the InvestAI Facility, GenAI4EU, and the Apply AI Alliance.
The Deepfake Detection Challenge (DFDC) is an initiative to advance research on detecting AI-generated synthetic media (deepfakes). It highlights the gap between rapidly improving deepfake generation capabilities and the slower development of reliable detection tools, reflecting a broader pattern of defensive research lagging behind offensive AI capabilities.
This URL was intended to link to Metaculus's collection of AI-related forecasting questions, a platform where forecasters make probabilistic predictions about AI development timelines, capabilities, and risks. However, the page currently returns a 404 error, indicating the content may have moved or been reorganized.
The official UK government page for the AI Safety Summit 2023, held November 1-2 at Bletchley Park, which convened governments, AI companies, civil society, and researchers to address frontier AI risks. Key outputs include the Bletchley Declaration—a multilateral agreement on AI safety—company safety policies, and a frontier AI capabilities and risks discussion paper. The summit marked a landmark moment in international AI governance coordination.
The 2022 ESPAI surveyed 738 machine learning researchers (NeurIPS/ICML authors) about AI progress timelines and risks, serving as a replication and update of the 2016 survey. Key findings include an aggregate forecast of 50% chance of HLMI by 2059 (37 years from 2022), with significant disagreement among experts about timelines and risks.
OpenAI's safety hub outlines their multi-stage approach to AI safety through teaching (value alignment and content filtering), testing (red teaming and preparedness evaluations), and sharing (real-world feedback loops). It covers key concern areas including child safety, deepfakes, bias, and election integrity, and links to their Preparedness Framework and related safety documentation.
METR is an organization conducting research and evaluations to assess the capabilities and risks of frontier AI systems, focusing on autonomous task completion, AI self-improvement risks, and evaluation integrity. They have developed the 'Time Horizon' metric measuring how long AI agents can autonomously complete software tasks, showing exponential growth over recent years. They work with major AI labs including OpenAI, Anthropic, and Amazon to evaluate catastrophic risk potential.
This paper addresses the critical role of model evaluation in mitigating extreme risks from advanced AI systems. As AI development progresses, general-purpose AI systems increasingly possess both beneficial and harmful capabilities, including potentially dangerous ones like offensive cyber abilities or manipulation skills. The authors argue that two types of evaluations are essential: dangerous capability evaluations to identify harmful capacities, and alignment evaluations to assess whether models are inclined to use their capabilities for harm. These evaluations are vital for informing policymakers and stakeholders, and for making responsible decisions regarding model training, deployment, and security.
IBM's annual Cost of a Data Breach Report, produced with the Ponemon Institute, provides global research on data breach costs, trends, and contributing factors. The 2025 edition highlights an 'AI oversight gap' where rapid AI adoption is outpacing security governance, with ungoverned AI systems facing higher breach likelihood and costs. The global average breach cost reached $4.4M USD.
This paper presents DeepSeek LLM, an open-source large language model project that addresses inconsistencies in scaling law literature by providing empirical findings for scaling models at 7B and 67B parameters. The authors developed a 2 trillion token dataset and applied supervised fine-tuning and Direct Preference Optimization to create DeepSeek Chat models. Their evaluation demonstrates that DeepSeek LLM 67B outperforms LLaMA-2 70B across multiple benchmarks, particularly in code, mathematics, and reasoning tasks, with the chat variant showing competitive performance against GPT-3.5.
The Open LLM Leaderboard is a HuggingFace-hosted benchmarking platform that compares open-source large language models across standardized evaluations in a transparent and reproducible manner. It allows researchers and practitioners to filter, search, and rank models by performance metrics, providing a community reference for tracking AI capabilities progress. The leaderboard has since been archived, reflecting the rapid pace of LLM development.
The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while promoting trustworthiness across design, development, deployment, and evaluation. It provides structured guidance organized around core functions and is accompanied by a Playbook, Roadmap, and a Generative AI Profile (2024) addressing risks specific to generative AI systems.
15FTC Warning: AI Voice Cloning and the 'Loved One in Distress' ScamFederal Trade Commission·Government▸
The FTC warns consumers about AI-powered voice cloning scams where fraudsters impersonate distressed family members to extract money. The post explains how scammers use readily available AI tools to clone voices from social media and other public audio, and offers guidance on how to verify calls and protect against such fraud.
The Center for AI Safety (CAIS) is a research organization focused on mitigating catastrophic and existential risks from advanced AI systems. It conducts technical research, publishes surveys and statements, and supports field-building efforts across academia and industry. CAIS is notable for its broad coalition-building, including its widely-cited statement on AI extinction risk signed by leading researchers.
Reuters analysis examining how AI-generated misinformation poses risks to the 2024 election cycle, covering deepfakes, synthetic media, and coordinated disinformation campaigns. The piece assesses the scale of the threat and the challenges platforms and regulators face in detection and mitigation.
RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on AI risks, military applications, and regulatory frameworks from one of the leading U.S. defense and policy think tanks.
Epoch AI's analysis of historical trends in compute used for training notable AI systems, identifying three distinct eras: pre-deep learning, deep learning, and large-scale models. The research documents how training compute has grown by roughly 4-5 orders of magnitude since 2010, with a notable shift toward massive compute investment after 2015.
This resource from the Future of Humanity Institute (FHI) at Oxford involves expert elicitation surveys focused on AI development timelines, capability thresholds, and prioritization of interventions. It aggregates forecasts from researchers to inform understanding of when transformative AI might arrive and what safety measures may be most effective.
This Stanford HAI resource examines the challenges AI tools—particularly large language models—pose to academic integrity, exploring how institutions should respond to AI-assisted cheating and the broader implications for education and assessment. It likely draws on Stanford research to assess the prevalence and nature of AI misuse in academic settings and proposes policy frameworks for educators.
A CSET analysis examining how artificial intelligence is reshaping the cybersecurity landscape, covering both offensive and defensive applications of AI in cyber operations. The report assesses near- and long-term implications of AI capabilities for cyber threats, defenses, and policy responses.
The C2PA is an industry coalition that has developed an open technical standard for attaching verifiable provenance metadata to digital content, functioning like a 'nutrition label' that tracks a file's origin, creation tools, and edit history. This standard aims to help consumers and platforms distinguish authentic content from manipulated or AI-generated media. It is backed by major technology and media companies including Adobe, Microsoft, and the BBC.