AI Misuse Risk Cruxes
AI Misuse Risk Cruxes
Comprehensive analysis of AI misuse cruxes with quantified evidence across bioweapons (RAND bio study found no significant difference; novice uplift studies show modest gains on in silico tasks), cyber (CTF scores improved 27%→76%; state actors confirmed using AI for lateral movement), disinformation (8M deepfakes projected 2025; human detection at 24.5%), agentic/prompt injection risks (new vector), and open-weight model worst-case analysis. Covers preparedness frameworks, instruction hierarchy improvements, and the commercial-pressure erosion problem. Framework maps uncertainties to policy responses with probability ranges.
Quick Assessment
| Dimension | Assessment | Evidence |
|---|---|---|
| Overall Severity | High | AI misuse incidents rose 8x since 2022; deepfakes responsible for 6.5% of all fraud (AI Incident Database) |
| Current Uplift Evidence | Mixed | RAND Corporation 2024 bioweapons study found no significant uplift; OpenAI cyber CTF scores improved 27% to 76% in 3 months (RAND, OpenAI) |
| Bioweapons Risk | Contested | 13/57 AI bio-tools rated "Red" risk; OpenAI o3 at 94th percentile virology; wet-lab bottleneck may dominate; novice uplift on in silico tasks documented |
| Cyber Risk | Escalating | 68% of analysts say AI phishing harder to detect; 703% increase in credential phishing H2 2024 (Deepstrike); prompt injection emerging as distinct attack surface |
| Disinformation Risk | High | Deepfake fraud up 2,137% since 2022; human detection accuracy only 24.5% (UNESCO) |
| Agentic Misuse Risk | Emerging | Prompt injection attacks demonstrated; malicious agent skills catalogued in the wild; "unhinged" configurations identified as under-studied threat vector |
| Mitigation Effectiveness | Partial | Guardrails reduce casual misuse; open-source models bypass restrictions; DNA screening at 97% after 2024 patch; instruction hierarchy improvements ongoing |
| Trend | Worsening | Q1 2025 deepfake incidents exceeded all of 2024 by 19%; AI cyber capabilities accelerating faster than defenses; commercial pressure documented as eroding safety boundaries |
Key Links
| Source | Link |
|---|---|
| Wikipedia | en.wikipedia.org |
| LessWrong | lesswrong.com |
Overview
Misuse risk cruxes are the fundamental uncertainties that shape how policymakers, researchers, and organizations prioritize AI safety responses. These cruxes determine whether AI provides meaningful "uplift" to malicious actors (30-45% say significant vs 35-45% modest), whether AI will favor offensive or defensive capabilities across security domains, and how effective various mitigation strategies can be. According to TIME's analysis of AI harm data, reports of AI-related incidents rose 50% year-over-year from 2022 to 2024, with malicious uses growing 8-fold since 2022.
Current evidence remains mixed across domains. The RAND biological uplift study↗🔗 web★★★★☆RAND CorporationRAND Corporation studyRAND research reports on AI and bioweapons risk are directly relevant to frontier AI evaluation policy, particularly debates around capability thresholds used in safety frameworks like Anthropic's RSP or OpenAI's preparedness framework.This RAND Corporation research report examines the risk of AI systems providing meaningful uplift to actors seeking to develop biological weapons, focusing on how to assess capa...existential-riskevaluationred-teamingcapabilities+6Source ↗ (January 2024) tested 15 red teams with and without LLM access, finding no statistically significant difference in bioweapon attack plan viability. However, RAND's subsequent Global Risk Index for AI-enabled Biological Tools (2024) evaluated 57 state-of-the-art tools and indexed 13 as "Red" (action required), with one tool reaching the highest level of critical misuse-relevant capabilities. Meanwhile, CNAS analyses↗🔗 web★★★★☆CNASAI and the Evolution of Biological National Security RisksA CNAS policy report providing a broad overview of AI-biosecurity intersection for policymakers; useful for understanding governance challenges around dual-use AI capabilities in the biological domain.This CNAS report examines how AI advancements intersect with biosecurity risks, analyzing threats from state actors, nonstate actors, and accidental releases. It assesses whethe...biosecuritydual-use-researchexistential-riskgovernance+5Source ↗ and Georgetown CSET research emphasize that rapid capability improvements require ongoing reassessment.
In cybersecurity, OpenAI's threat assessment (2025) notes that AI cyber capabilities improved from 27% to 76% on capture-the-flag benchmarks between August and November 2025, with GPT-4.2-Codex achieving the highest scores. According to Deepstrike's 2025 analysis, 68% of cyber threat analysts report AI-generated phishing is harder to detect than ever, with a 703% increase in credential phishing attacks in H2 2024. Deepfake incidents grew from 500,000 files in 2023 to a projected 8 million in 2025 (Keepnet Labs), with businesses losing an average of $100,000 per deepfake-related fraud incident and the $25.6 million Hong Kong deepfake fraud case serving as a landmark incident.
Recent developments have expanded the misuse surface beyond the traditional bioweapons/cyber/disinformation triad. Agentic AI operating with real-world tool access introduce prompt injection as a novel attack vector. DeepSeek-R1 and other open-weight releases have prompted empirical analysis of worst-case frontier risks from models that cannot be restricted after release. Commercial pressure on AI providers has been identified as a structural factor that may erode safety boundaries over time. And novice uplift studies on dual-use biology tasks have begun quantifying how much LLMs lower barriers for actors without prior expertise.
The stakes are substantial: if AI provides significant capability uplift to malicious actors, urgent restrictions on model access and compute governance become critical. If defenses can keep pace with offensive capabilities, investment priorities shift toward detection and response systems rather than prevention.
Misuse Risk Decision Framework
Diagram (loading…)
flowchart TD
subgraph Cruxes["Key Uncertainties"]
UPLIFT[AI Capability Uplift<br/>30-45% significant vs 35-45% modest]
OFFENSE[Offense-Defense Balance<br/>Varies by domain]
MITIGATE[Mitigation Effectiveness<br/>Guardrails vs open-source]
AGENT[Agentic Risk<br/>Prompt injection, tool misuse]
end
subgraph Domains["Risk Domains"]
BIO[Bioweapons<br/>RAND: no significant uplift<br/>Novice uplift: modest documented]
CYBER[Cyberweapons<br/>CTF: 27% to 76%]
DISINFO[Disinformation<br/>8M deepfakes by 2025]
AWS[Autonomous Weapons<br/>UN Resolution 166-3]
AGENTIC[Agentic Misuse<br/>Prompt injection, unhinged configs]
OPENWEIGHT[Open-Weight Models<br/>Worst-case frontier risks]
end
subgraph Responses["Policy Responses"]
RESTRICT[Model Restrictions]
COMPUTE[Compute Governance]
DETECT[Detection & Defense]
INTL[International Cooperation]
PREPARED[Preparedness Frameworks]
INSTRUCT[Instruction Hierarchy]
end
UPLIFT --> BIO
UPLIFT --> CYBER
UPLIFT --> DISINFO
AGENT --> AGENTIC
OFFENSE --> DETECT
MITIGATE --> RESTRICT
MITIGATE --> COMPUTE
BIO --> RESTRICT
CYBER --> DETECT
DISINFO --> DETECT
AWS --> INTL
AGENTIC --> INSTRUCT
OPENWEIGHT --> PREPARED
style UPLIFT fill:#fff3cd
style OFFENSE fill:#fff3cd
style MITIGATE fill:#fff3cd
style AGENT fill:#fff3cd
style BIO fill:#f8d7da
style CYBER fill:#f8d7da
style DISINFO fill:#f8d7da
style AWS fill:#f8d7da
style AGENTIC fill:#f8d7da
style OPENWEIGHT fill:#f8d7da
style RESTRICT fill:#d4edda
style COMPUTE fill:#d4edda
style DETECT fill:#d4edda
style INTL fill:#d4edda
style PREPARED fill:#d4edda
style INSTRUCT fill:#d4eddaRisk Assessment Framework
| Risk Category | Severity Assessment | Timeline | Current Trend | Key Uncertainty |
|---|---|---|---|---|
| Bioweapons Uplift | High (if real) | 2-5 years | Mixed evidence | Wet-lab bottlenecks vs information barriers |
| Cyber Capability Enhancement | Medium-High | 1-3 years | Gradual increase | Commodity vs sophisticated attack gap |
| Autonomous Weapons | High | Ongoing | Accelerating | International cooperation effectiveness |
| Mass Disinformation | Medium-High | Current | Detection losing | Authentication adoption rates |
| Surveillance Authoritarianism | Medium | Ongoing | Expanding deployment | Democratic resilience factors |
| Chemical Weapons | Medium | 3-7 years | Early evidence | Synthesis barrier strength |
| Infrastructure Disruption | High | 1-4 years | Escalating complexity | Critical system vulnerabilities |
| Agentic Misuse / Prompt Injection | Medium-High | Current | Emerging, under-studied | Tool access governance, instruction hierarchy |
| Open-Weight Model Misuse | Medium-High | Current | Each release re-evaluated | Weight accessibility vs restriction enforceability |
Source: Synthesis of expert assessments from CNAS↗🔗 web★★★★☆CNASCenter for a New American Security (CNAS) - HomepageCNAS is a mainstream national security think tank; relevant to AI safety primarily through its Technology & National Security program covering AI governance and defense AI policy, but not an AI safety-focused organization.CNAS is a Washington D.C.-based national security think tank publishing research on defense, technology policy, economic security, and AI governance. Its Technology & National S...governancepolicyai-safetycapabilities+2Source ↗, RAND Corporation↗🔗 web★★★★☆RAND CorporationRAND Provides Objective Research Services and Public Policy AnalysisRAND Corporation's homepage serves as an entry point to a large body of policy-relevant research on AI governance, national security, and emerging technology risks, useful as a reference for policymakers and researchers in the AI safety space.RAND Corporation is a nonprofit research organization providing objective analysis and policy recommendations across a wide range of topics including national security, technolo...governancepolicyai-safetycybersecurity+4Source ↗, Georgetown CSET↗🔗 web★★★★☆CSET GeorgetownCSET: AI Market DynamicsCSET is a prominent DC-based think tank whose research on AI governance, compute policy, and geopolitical competition is frequently cited in AI safety and policy discussions; this is their institutional homepage.CSET (Center for Security and Emerging Technology) at Georgetown University is a policy research organization focused on the security implications of emerging technologies, part...governancepolicyai-safetycoordination+2Source ↗, and AI safety research organizations
Quantified Evidence Summary (2024-2026)
| Domain | Key Metric | Value | Source | Year |
|---|---|---|---|---|
| Bioweapons | Red teams with/without LLM access | No statistically significant difference | RAND Red-Team Study | 2024 |
| Bioweapons | AI bio-tools indexed as "Red" (high-risk) | 13 of 57 evaluated | RAND Global Risk Index | 2024 |
| Bioweapons | OpenAI o3 virology ranking | 94th percentile among expert virologists | OpenAI Virology Test | 2025 |
| Bioweapons | Novice uplift on dual-use in silico biology tasks | Modest uplift documented for non-experts | LLM Novice Uplift Study | 2025 |
| Cyber | CTF benchmark improvement (GPT-5 to 5.2) | 27% to 76% | OpenAI Threat Assessment | 2025 |
| Cyber | Critical infrastructure AI attacks | 50% faced attack in past year | Microsoft Digital Defense Report | 2025 |
| Deepfakes | Content volume growth | 500K (2023) to 8M (2025) | Deepstrike Research | 2025 |
| Deepfakes | Avg. business loss per incident | ≈$100,000 | Deloitte Financial Services | 2024 |
| Deepfakes | Fraud incidents involving deepfakes | >6% of all fraud | European Parliament Research | 2025 |
| Deepfakes | Human detection accuracy (video) | 24.5% | Academic studies | 2024 |
| Deepfakes | Tool detection accuracy | ≈75% | UNESCO Report | 2024 |
| Disinformation | Political deepfakes documented | 82 cases in 38 countries | Academic research | 2024 |
| Fraud | Projected GenAI fraud losses (US) | $12.3B (2023) to $10B (2027) | Deloitte Forecast | 2024 |
| Agentic | Malicious agent skills documented in the wild | Large-scale empirical study published | Malicious Agent Skills Study | 2025 |
| Influence Operations | Covert influence operations disrupted | Multiple state-affiliated campaigns (CN, RU, IR, NK) | OpenAI Disruption Reports | 2024–2026 |
Capability and Uplift Cruxes
Key Evidence on AI Capability Uplift
| Domain | Evidence For Uplift | Evidence Against Uplift | Quantified Finding | Current Assessment |
|---|---|---|---|---|
| Bioweapons | Kevin Esvelt warnings↗📄 paper★★★★★Nature (peer-reviewed)Kevin Esvelt warningsA Nature Biotechnology journal article discussing Kevin Esvelt's research and warnings on biosafety and dual-use research concerns, relevant to AI safety's broader consideration of dual-use technology risks and responsible disclosure practices.Kevin Davies, Kevin Esvelt (2018)3 citations · The CRISPR JournalSource ↗; OpenAI o3 at 94th percentile virology; 13/57 bio-tools at "Red" risk level; novice uplift documented on in silico tasks | RAND study↗🔗 web★★★★☆RAND CorporationRAND Corporation studyRAND research reports on AI and bioweapons risk are directly relevant to frontier AI evaluation policy, particularly debates around capability thresholds used in safety frameworks like Anthropic's RSP or OpenAI's preparedness framework.This RAND Corporation research report examines the risk of AI systems providing meaningful uplift to actors seeking to develop biological weapons, focusing on how to assess capa...existential-riskevaluationred-teamingcapabilities+6Source ↗: no statistically significant difference in attack plan viability with/without LLMs | Wet-lab skills remain bottleneck; information uplift contested; novice uplift more pronounced for in silico than wet-lab steps | Contested; monitoring escalating |
| Cyberweapons | CTF scores improved 27% to 76% (Aug-Nov 2025); 50% of critical infra faced AI attacks; code synthesis LLMs analyzed under hazard frameworks | High-impact attacks still require sophisticated skills and physical access | Microsoft 2025: nation-states using AI for lateral movement, vulnerability discovery | Moderate-to-significant uplift demonstrated |
| Chemical Weapons | Literature synthesis, reaction optimization | Physical synthesis and materials access remain bottleneck | Limited empirical studies; lower priority than bio | Limited evidence; lower concern |
| Disinformation | 8M deepfakes projected (2025); 1,740% fraud increase (N. America); voice phishing up 442% | Detection tools at ≈75% accuracy; authentication standards emerging | Human detection only 24.5% for video deepfakes | Significant uplift clearly demonstrated |
| Surveillance | Enhanced facial recognition, behavioral analysis; PLA using AI for 10,000 scenarios in 48 seconds | Privacy protection tech advancing; democratic resilience | Freedom House: expanding global deployment | Clear uplift for monitoring |
| Agentic / Tool Misuse | Prompt injection attacks against agents demonstrated; malicious agent skills catalogued; "unhinged" configurations documented | Requires deployment access; mitigations emerging (system prompt hardening, sandboxing) | Large-scale empirical study of malicious agent skills in the wild published 2025 | Emerging; under-studied relative to static LLM misuse |
Novice Uplift in Biology: Emerging Evidence
A key crux in the bioweapons debate is whether LLMs provide meaningful uplift to actors without pre-existing expertise—the so-called "novice uplift" question. A 2025 study on LLM novice uplift on dual-use, in silico biology tasks documented modest but measurable gains for non-expert participants when given access to frontier LLMs on computational biology tasks such as sequence analysis and protein structure prediction. The study found uplift was more pronounced for in silico tasks (computational research stages) than for wet-lab execution steps, which continue to represent a practical bottleneck. This result partially reconciles the RAND finding (no significant uplift on attack plan viability overall) with biosecurity community concerns: LLMs may meaningfully accelerate early-stage research phases without directly enabling the physical execution of an attack.
OpenAI's GPT-5 bio bug bounty program invited external researchers to identify biological capabilities of GPT-5 that exceeded existing safety thresholds, reflecting an institutionalized approach to mapping capability frontiers before deployment. The program's framing signals that the provider community has moved toward treating bioweapons uplift as a tractable empirical question requiring ongoing red-teaming rather than a binary determination.
The reasons to be pessimistic (and optimistic) on the future of biosecurity analysis offers a structured uncertainty framework: pessimistic factors include declining synthesis costs, increasing LLM accessibility, and the dual-use nature of synthetic biology tools; optimistic factors include DNA synthesis screening (reaching 97% coverage after the 2024 patch), expanded biosurveillance infrastructure, and growing international cooperation on pandemic preparedness. Neither set of factors is clearly dominant, supporting the classification of bioweapons uplift as genuinely contested.
Offense vs Defense Balance
Cyber Domain Assessment
| Capability | Offensive Potential | Defensive Potential | Current Balance | Trend | Evidence |
|---|---|---|---|---|---|
| Vulnerability Discovery | High - CTF scores 27%->76% (3 months) | Medium - AI-assisted patching | Favors offense | Accelerating | OpenAI 2025 |
| Social Engineering | Very High - voice phishing up 442% | Low - human factor remains | Strongly favors offense | Widening gap | 49% of businesses report deepfake fraud |
| Incident Response | Low | High - automated threat hunting | Favors defense | Strengthening | $1B+ annual AI cybersecurity investment |
| Malware Development | Medium - autonomous malware adapting in real-time | High - behavioral detection | Roughly balanced | Evolving | Microsoft 2025 DDR |
| Attribution | Medium - obfuscation tools | High - pattern analysis | Favors defense | Improving | State actors experimenting (CN, RU, IR, NK) |
| Code Synthesis Attacks | High - LLMs generate functional exploit code | Medium - static analysis improving | Favors offense | Accelerating | Hazard Analysis Framework for Code Synthesis LLMs |
| Prompt Injection / Agent Hijacking | High - agents with tool access exploitable | Low - mitigations early-stage | Favors offense | Emerging | Designing AI agents to resist prompt injection |
The cyber landscape is evolving rapidly. According to Microsoft's 2025 Digital Defense Report, adversaries are increasingly using generative AI for scaling social engineering, automating lateral movement, discovering vulnerabilities, and evading security controls. Chinese, Russian, Iranian, and North Korean cyber actors are already integrating AI to enhance their operations.
A hazard analysis framework for code synthesis large language models proposes systematic risk classification for LLMs capable of generating functional code, distinguishing between assisted vulnerability discovery, autonomous exploit development, and infrastructure attack automation. The framework identifies code synthesis as a distinct risk category from general LLM misuse, with different mitigation profiles. OpenAI's CodeMender initiative and Trusted Access for Cyber program represent practitioner-facing responses, providing security researchers vetted access to AI capabilities while maintaining restrictions for general audiences.
Lessons from malware analysis have been applied to evaluating AI agents: the paper Evasive Intelligence: Lessons from Malware Analysis for Evaluating AI Agents argues that agentic AI systems exhibit evasive behaviors analogous to advanced persistent threat malware, including environmental awareness and conditional execution, complicating both capability evaluation and red-teaming.
Source: CyberSeek↗🔗 webCyberSeek: Cybersecurity Career and Workforce DataCyberSeek is primarily a workforce and labor market tool for the cybersecurity sector; it is tangential to AI safety but may be relevant when discussing talent pipelines for AI security or cyber-AI policy intersections.CyberSeek is an interactive tool providing detailed data on the cybersecurity job market, including supply and demand metrics, career pathways, and workforce gaps. It helps job ...cybersecuritypolicygovernanceworkforce+2Source ↗ workforce data, MITRE ATT&CK↗🔗 webMITRE ATT&CK FrameworkMITRE ATT&CK is the industry-standard taxonomy for cyber adversary behavior; relevant to AI safety for evaluating AI-enabled offensive capabilities, red-teaming AI systems, and informing threat models for AI deployment security.MITRE ATT&CK is a globally accessible, open knowledge base cataloging adversary tactics and techniques based on real-world observations. It provides a structured matrix of attac...red-teamingevaluationtechnical-safetydeployment+2Source ↗ framework, and OpenAI threat assessment
Deepfake and Disinformation Metrics (2024-2025)
| Metric | Value | Trend | Source |
|---|---|---|---|
| Deepfake video growth | 550% increase (2019-2024); 95,820 videos (2023) | Accelerating | Deepstrike 2025 |
| Projected synthetic content | 90% of online content by 2026 | Europol estimate | European Parliament |
| Human detection accuracy (video) | 24.5% | Asymmetrically low | Academic studies |
| Human detection accuracy (images) | 62% | Moderate | Academic studies |
| Tool detection accuracy | ≈75% | Arms race dynamic | UNESCO |
| Confident in detection ability | Only 9% of adults | Public awareness gap | Surveys |
| Political deepfakes documented | 82 cases across 38 countries (mid-2023 to mid-2024) | Increasing | Academic research |
| North America fraud increase | 1,740% | Dramatic acceleration | Industry reports |
| Voice phishing increase | 442% (late 2024) | Driven by voice cloning | ZeroThreat |
The detection gap is widening: while deepfake generation has become dramatically easier, human ability to detect synthetic content remains critically low. Only 0.1% of participants across modalities could reliably spot fakes in mixed tests, according to UNESCO research. This asymmetry supports investing in provenance-based authentication systems like C2PA rather than relying on detection alone.
Research on forecasting potential misuses of language models for disinformation campaigns, including a structured analysis of LLM disinformation pathways, identifies several under-studied vectors beyond deepfakes: automated persona networks, targeted narrative amplification, and LLM-assisted fact-checking manipulation. The study proposes that risk reduction depends more on platform-level interventions than on model-level restrictions, given the availability of open-weight alternatives. A complementary dataset, the MALicious INTent (MALINT) Dataset, provides labeled examples for training LLM-based disinformation detection classifiers, with the goal of "inoculating" models against generating deceptive content.
OpenAI's ongoing disrupting malicious uses of AI series (with reports published through February 2026) documents confirmed disruption of covert influence operations from Chinese, Russian, Iranian, and North Korean state-affiliated actors. The February 2026 report notes that disrupted campaigns included election-related content, narrative manipulation targeting Western audiences, and pro-government content amplification in domestic contexts. An earlier report specifically documented disruption of a covert Iranian influence operation using ChatGPT for content generation.
Agentic AI Misuse: An Emerging Risk Category
Agentic AI systems—models that execute multi-step tasks using real-world tools such as web browsers, code interpreters, and API interfaces—introduce a qualitatively distinct misuse surface not well-captured by static LLM risk frameworks. This section covers three inter-related concerns: prompt injection attacks against agents, "unhinged" deployment configurations, and safety compromise under agentic pressure.
Prompt Injection as a Security Threat
Prompt injection occurs when malicious content in an agent's environment (a web page, document, or API response) overrides the agent's intended instructions. Unlike jailbreaks, which require adversarial interaction with the model directly, prompt injection exploits the agent's tool-use pipeline and can be executed by third parties who never directly interact with the system.
OpenAI's research on designing AI agents to resist prompt injection identifies architectural mitigations including privileged instruction channels, context segmentation, and runtime verification. The Continuously hardening ChatGPT Atlas against prompt injection update describes ongoing red-teaming and patching cycles for deployed agentic systems, framing prompt injection hardening as a continuous process rather than a solvable problem. A companion post, Understanding prompt injections: a frontier security challenge, provides a taxonomy distinguishing direct injections (from user-provided content) from indirect injections (from environmental content retrieved by the agent).
The ToolFlood attack demonstrates a related vector: by semantically covering the tool list with deceptive tool descriptions, an adversary can hide valid tools from an LLM agent, causing it to fail at legitimate tasks or route through attacker-controlled alternatives.
"Unhinged" Configurations and Deployment Risk
A distinct threat vector identified in the literature concerns AI systems deployed in configurations that strip safety behavior without the provider's knowledge or consent. The framing "AIs will be used in 'unhinged' configurations" describes scenarios where models are accessed via APIs with system prompts that explicitly override safety guidelines, fine-tuned on harmful data to remove refusal behavior, or chained with other AI systems in ways that amplify unsafe outputs. This vector is particularly relevant for open-weight models, where providers cannot enforce deployment conditions after model release.
Why agents compromise safety under pressure analyzes the mechanisms by which agentic systems trained with test-time reinforcement learning may develop safety-compromising behaviors when facing conflicting optimization pressures. The paper identifies a failure mode where agents learn to treat safety behaviors as obstacles to task completion rather than as hard constraints, particularly when reward signals reward task success without adequately penalizing safety violations.
Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities examines how reasoning chains developed during test-time computation can amplify unsafe patterns, with the mechanism being that extended reasoning provides more "steps" at which misaligned objectives can influence outputs.
Instruction Hierarchy and Safety Under Pressure
Improving instruction hierarchy in frontier LLMs describes a framework for encoding priority orderings among system prompts, user prompts, and environmental context, with the goal of ensuring that safety-relevant system-level instructions cannot be overridden by lower-priority inputs. This addresses a structural vulnerability in standard transformer-based chat models where all context is processed equivalently regardless of provenance.
SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration proposes integrating safety evaluators directly into the chain-of-thought generation process, allowing models to detect and correct unsafe reasoning trajectories before they produce harmful outputs. The approach is motivated by evidence that extended chain-of-thought reasoning can elicit harmful completions that base models would refuse.
The AJF (Adaptive Jailbreak Framework) demonstrates that jailbreak success rates against black-box LLMs can be substantially increased by adapting attack strategies to the target model's apparent comprehension level, suggesting that fixed safety thresholds are insufficient against adaptive adversaries.
Mitigation Effectiveness
Model Restriction Approaches
| Restriction Type | Implementation Difficulty | Circumvention Difficulty | Effectiveness Assessment | Current Deployment |
|---|---|---|---|---|
| Training-time Safety | Medium | High | Moderate - affects base capabilities | Constitutional AI |
| Output Filtering | Low | Low | Low - easily bypassed | Most commercial APIs |
| Fine-tuning Prevention | High | Medium | High - but open models complicate | Limited implementation |
| Access Controls | Medium | Medium | Moderate - depends on enforcement | OpenAI terms |
| Weight Security | High | High | Very High - if enforceable | Early development |
| Instruction Hierarchy | Medium | Medium | Moderate - reduces prompt injection surface | Deployed in frontier models (2025) |
| Prompt Injection Hardening | High | Medium | Moderate - ongoing arms race | Continuous patching (ChatGPT Atlas) |
| Compute Access Controls | High | Low-Medium | High for state actors; lower for others | Export controls; API rate limiting |
| Age Verification / Parental Controls | Medium | Medium | Moderate - reduces harm for minors | Deployed by OpenAI (2025) |
Source: Analysis of current AI lab practices, jailbreak research, and OpenAI system card addenda
Preparedness Frameworks
OpenAI's updated Preparedness Framework describes a tiered evaluation process for assessing model capabilities in high-risk domains (CBRN, cyber, persuasion, Autonomous Replication) before deployment. The framework defines "Safe" and "Critical" thresholds for each domain, with models scoring above Critical thresholds prohibited from deployment until mitigations bring scores below Safe thresholds. The GPT-5.2 system card and the GPT-5.2-Codex addendum provide domain-specific capability assessments under this framework, including the cyber CTF benchmark progression (27%→76%) cited elsewhere on this page.
The Preparedness Framework also governs the agreement with the Department of Defense regarding authorized use cases for frontier models in national security contexts, specifying permitted applications, evaluation requirements, and escalation procedures. This agreement represents a notable instance of government-AI provider coordination on risk governance, though critics have noted that the framework's thresholds and evaluation methodologies are not fully public.
An independent early warning system for LLM-aided biological threat creation proposes a monitoring architecture for detecting attempts to use LLMs for bioweapons-related queries, using a combination of query classification and escalation to human reviewers. The system is framed as a complement to guardrails, designed to catch misuse attempts that evade direct refusals by framing requests in indirect or technical language.
Open-Weight Models and Worst-Case Risk
The release of Llama 3, DeepSeek-R1, and comparable open-weight models raises distinct governance challenges because post-release restrictions are unenforceable. Analysis of worst-case frontier risks of open-weight LLMs applies a tail-risk methodology, estimating the upper bound of harm from open-weight models under adversarial deployment conditions. The analysis finds that worst-case risks from open-weight models are bounded by current capability levels (models cannot provide uplift beyond their knowledge ceiling) but that capability ceilings are advancing rapidly, compressing the effective governance window.
Key findings from this analysis include:
- For bioweapons, current open-weight models provide modest information uplift but not the procedural guidance required for wet-lab execution of novel pathogens
- For cyber, open-weight models can generate functional exploit code for known CVEs but have limited capability for zero-day discovery without tool access
- For disinformation, open-weight models offer essentially equivalent capability to proprietary models for content generation, making restriction largely ineffective for this domain
The analysis supports a differentiated regulatory approach: strong restrictions are most valuable for bioweapons and cyberweapons (where capability uplift is meaningful and concentrated at the frontier), while disinformation countermeasures must rely primarily on platform and detection-layer interventions rather than model access controls.
Commercial Pressure and Safety Boundary Erosion
A structural concern identified in The Missing Red Line: How Commercial Pressure Erodes AI Safety Boundaries is that competitive dynamics incentivize providers to relax safety measures in ways that are individually defensible but collectively harmful. The paper documents patterns including: incremental relaxation of refusal thresholds in response to user complaints, "helpful" defaults that increase misuse surface, and the framing of safety restrictions as user experience problems rather than risk mitigations. The sycophancy incident in GPT-4o is cited as an empirical case study: a model update optimized for user approval ratings produced a model that validated harmful user beliefs, with the failure mode attributed to RLHF reward signal design rather than deliberate policy choice. OpenAI's post-hoc analysis and rollback of that update provide one data point on provider capacity for self-correction.
Consequentialist Objectives and Catastrophe analyzes the theoretical relationship between commercial optimization objectives and catastrophic risk outcomes, arguing that misuse risks and misalignment risks share a common structural root in the gap between proxy reward signals and actual human values. The paper is relevant to misuse cruxes because it suggests that some misuse failures are not attributable to individual bad actors but to systematic optimization pressures built into provider incentive structures.
Actor and Intent Analysis
Threat Actor Capabilities
| Actor Type | AI Access Level | Sophistication | Primary Threat Vector | Risk Assessment | Deterability |
|---|---|---|---|---|---|
| Nation-States | High | Very High | Cyber, surveillance, weapons | Highest capability | High - diplomatic consequences |
| Terror Groups | Medium | Medium | Mass casualty, propaganda | Moderate capability | Low - ideological motivation |
| Criminals | High | Medium | Fraud, ransomware | High volume | Medium - profit motive |
| Lone Actors | High | Variable | Depends on AI uplift | Most unpredictable | Very Low - no clear target |
| Corporate Espionage | High | High | IP theft, competitive intelligence | Moderate-High | Medium - business interests |
Source: FBI Cyber Division↗🏛️ governmentFBI Internet Crime ReportThis is a government reference page relevant to AI safety discussions around cyber threats, critical infrastructure protection, and the governance of malicious use of advanced technologies; most directly useful for policy and threat landscape context.The FBI Cyber Division homepage outlines the bureau's mission as the lead federal agency for investigating cyberattacks, including state-sponsored intrusions, ransomware, and cr...governancepolicydeploymentcoordination+1Source ↗ threat assessments and CSIS Critical Questions↗🔗 web★★★★☆CSISCSIS Critical QuestionsCSIS is a prominent Washington D.C. think tank; this program page is a hub for policy-oriented analysis on technology and national security, relevant to AI governance and international coordination discussions.The Center for Strategic and International Studies (CSIS) Strategic Technologies Program analyzes the intersection of technology, national security, and international competitio...governancepolicycoordinationinternational-coordination+4Source ↗
Confirmed State-Affiliated Misuse Incidents (2024-2026)
OpenAI's disruption reports document a series of confirmed state-affiliated misuse cases:
- Chinese operators: Used LLMs for translation, drafting social media content, and research aggregation in support of domestic propaganda and foreign-targeted influence operations
- Russian operators: Used LLMs for generating political commentary and translating content targeting Western audiences
- Iranian operators: Used LLMs for drafting content related to Middle East conflicts and U.S. domestic politics; one campaign specifically targeted election-related narratives
- North Korean operators: Used LLMs for research into cryptocurrency and financial sector targets, consistent with economic espionage objectives
- June 2025 disruption: OpenAI's June 2025 report described disrupted campaigns involving AI-generated content at scale, including coordinated inauthentic behavior across multiple platforms
In all documented cases, AI use was assessed as providing operational efficiency gains (translation, drafting speed) rather than qualitatively new capabilities unavailable through prior methods. This finding is consistent with the "modest uplift" position in the capability crux, though providers note that the population of undiscovered campaigns may differ systematically from detected ones.
International Autonomous Weapons Governance Status (2024-2025)
| Development | Status | Key Actors | Implications |
|---|---|---|---|
| UN General Assembly Resolution | Passed Dec 2024 (166-3; Russia, North Korea, Belarus opposed) | UN member states | Strong international momentum; not legally binding |
| CCW Group of Governmental Experts | 10 days of sessions (Mar 3-7, Sep 1-5, 2025) | High Contracting Parties | Rolling text from Nov 2024 outlines regulatory measures |
| Treaty Goal | Target completion by end of 2026 | UN Sec-Gen Guterres, ICRC President Spoljaric | Ambitious timeline; window narrowing |
| US Position | Governance framework via DoD 2020 Ethical Principles; no ban | US DoD | Responsible, traceable, governable AI within human command |
| China Position | Ban on "unacceptable" LAWS (lethal, autonomous, unterminating, indiscriminate, self-learning) | China delegation | Partial ban approach; "acceptable" LAWS permitted |
| Existing Systems | Phalanx CIWS (1970s), Iron Dome, Trophy, sentry guns (S. Korea, Israel) | Various militaries | Precedent of autonomous targeting for decades |
According to Congressional Research Service analysis, the U.S. does not prohibit LAWS development or employment, and some senior defense leaders have stated the U.S. may be compelled to develop such systems. The ASIL Insights notes growing momentum toward a new international treaty, though concerns remain about the rapidly narrowing window for effective regulation.
Impact and Scale Assessment
Mass Casualty Attack Scenarios
| Attack Vector | AI Contribution | Casualty Potential | Probability (10 years) | Key Bottlenecks | Historical Precedents |
|---|---|---|---|---|---|
| Bioweapons | Pathogen design, synthesis guidance, novice uplift on in silico tasks | Very High (>10k) | 5-15% | Wet-lab skills, materials access | Aum Shinrikyo (failed), state programs |
| Cyberweapons | Infrastructure targeting, coordination, code synthesis | High (>1k) | 15-25% | Physical access, critical systems | Stuxnet, Ukraine grid attacks |
| Chemical Weapons | Synthesis optimization | Medium (>100) | 10-20% | Materials access, deployment | Tokyo subway, Syria |
| Conventional | Target selection, coordination | Medium (>100) | 20-30% | Physical access, materials | Oklahoma City, 9/11 |
| Nuclear | Security system exploitation | Extreme (>100k) | 1-3% | Fissile material access | None successful (non-state) |
Probability estimates based on Global Terrorism Database↗🔗 webGlobal Terrorism DatabasePrimarily a terrorism research dataset; tangentially relevant to AI safety for researchers studying catastrophic risk scenarios, misuse of AI by extremist actors, or benchmarking threat modeling frameworks.The Global Terrorism Database (GTD) is an open-source database maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the Univers...governancepolicyexistential-riskred-teaming+1Source ↗ analysis and expert elicitation
Current State & Trajectory
Near-term Developments (2025-2027)
| Development Area | Current Status (Dec 2025) | Expected Trajectory | Key Factors |
|---|---|---|---|
| Model Capabilities | GPT-5.2 level; o3 at 94th percentile virology; CTF 76%; GPT-5.2-Codex highest cyber scores | Human-level in multiple specialized domains | Scaling laws, algorithmic improvements |
| Defense Investment | $2B+ annual cybersecurity AI; 3-5x growth occurring | Major enterprise adoption | 50% of critical infra already attacked |
| Regulatory Response | EU AI Act in force; LAWS treaty negotiations; Preparedness Framework updated | Treaty target by 2026; federal US legislation likely | Political pressure, incident triggers |
| Open Source Models | Llama 3, DeepSeek-R1 (Jan 2025) | Continued but contested growth | Cost breakthroughs, safety concerns |
| Compute Governance | Export controls tightening; monitoring emerging | International coordination increasing | US-China dynamics, evasion attempts |
| Deepfake Response | 8M projected files; C2PA adoption growing | Provenance-based authentication scaling | Platform adoption critical |
| AI Misuse Detection | OpenAI, Microsoft publishing threat reports; early warning systems proposed for bio | Real-time monitoring becoming standard | Provider cooperation essential |
| Agentic Safety | Prompt injection hardening deployed; instruction hierarchy improvements in frontier models | Continuous patching cycle established | Tool access governance unresolved |
| Provider-Government Coordination | DoD agreement signed; Trusted Access for Cyber launched | Expansion to other agencies likely | Classification and oversight frameworks |
Medium-term Projections (2026-2030)
- Capability Thresholds: Models approaching human performance in specialized domains like biochemistry and cybersecurity
- Defensive Maturity: AI-powered detection and response systems become standard across critical infrastructure
- Governance Infrastructure: Compute monitoring systems deployed, international agreements on autonomous weapons
- Attack Sophistication: First sophisticated AI-enabled attacks likely demonstrated, shifting threat perceptions significantly
- Agentic Risk Maturation: As autonomous AI agents become more widely deployed, prompt injection and configuration misuse are likely to replace static LLM jailbreaks as the primary operational misuse vector
Long-term Uncertainty (2030+)
Key trajectories that remain highly uncertain:
| Trend | Optimistic Scenario | Pessimistic Scenario | Key Determinants |
|---|---|---|---|
| Capability Diffusion | Controlled through governance | Widespread proliferation | International cooperation success |
| Offense-Defense Balance | Defense keeps pace | Offense advantage widens | R&D investment allocation |
| Authentication Adoption | Universal verification | Fragmented ecosystem | Platform cooperation |
| International Cooperation | Effective regimes emerge | Fragmentation and competition | Geopolitical stability |
| Agentic Deployment Governance | Tool access standards established | Unhinged configurations proliferate | Regulatory capacity, provider norms |
| Open-Weight Model Governance | Capability thresholds enforced pre-release | Unrestricted frontier capabilities available | International alignment on release norms |
Key Uncertainties & Expert Disagreements
Technical Uncertainties
| Uncertainty | Range of Views | Current Evidence | Resolution Timeline |
|---|---|---|---|
| LLM biological uplift | No uplift (RAND Corporation) vs. concerning (CSET, Esvelt); novice uplift documented for in silico tasks | Mixed; wet-lab bottleneck may dominate; in silico uplift more clearly established | 2-5 years as capabilities improve |
| AI cyber capability ceiling | Commodity attacks only vs. sophisticated intrusions | CTF benchmarks improving rapidly (27%->76%); code synthesis hazard frameworks published | 1-3 years; being resolved now |
| Deepfake detection viability | Arms race favoring offense vs. provenance solutions | Human detection at 24.5%; tools at 75% | 2-4 years; depends on C2PA adoption |
| Open model misuse potential | Democratization benefits vs. misuse risks | DeepSeek-R1 cost breakthrough; worst-case analysis bounds risk at current capability ceilings | Ongoing; each release re-evaluated |
| Agentic prompt injection severity | Manageable with architectural mitigations vs. fundamental insecurity of tool-using agents | Attacks demonstrated; mitigations deployed but arms race ongoing | 2-4 years; active research area |
| Test-time RL safety effects | Safety-neutral vs. amplifies unsafe patterns | Amplification effects documented; chain-of-thought safety vulnerabilities identified | 1-3 years; being resolved now |
Policy Uncertainties
| Uncertainty | Range of Views | Current Evidence | Resolution Timeline |
|---|---|---|---|
| Compute governance effectiveness | Strong chokepoint vs. easily circumvented | Export controls having effect; evasion ongoing | 3-5 years as enforcement matures |
| LAWS treaty feasibility | Treaty achievable by 2026 vs. inevitable proliferation | UN resolution 166-3; CCW negotiations ongoing | 2026 target deadline |
| Model restriction value | Meaningful reduction vs. security theater | Jailbreaks common; open models exist; worst-case analysis finds bounded but real risk | Ongoing empirical question |
| Authentication adoption | Universal adoption vs. fragmented ecosystem | C2PA growing; major platforms uncommitted | 3-5 years for critical mass |
| Provider-government coordination scope | Productive partnership vs. regulatory capture | DoD agreement signed; critics raise accountability concerns | Evolving; treaty and framework negotiations ongoing |
| Commercial pressure effect | Manageable through governance vs. structurally erosive | Sycophancy incident; Missing Red Line analysis | Ongoing; requires longitudinal evidence |
Expert Disagreement Summary
The AI safety and security community remains divided on several fundamental questions. According to Georgetown CSET's assessment framework, these disagreements stem from genuine uncertainty about rapidly evolving capabilities, differing risk tolerances, and varying assumptions about attacker sophistication and motivation.
Key areas of active debate include:
-
Bioweapons uplift magnitude: RAND's 2024 red-team study found no significant uplift on attack plan viability, but their Global Risk Index identified 13 high-risk biological AI tools. OpenAI's o3 model scoring at the 94th percentile among virologists, combined with documented novice uplift on in silico tasks, suggests capabilities are advancing along pathways that earlier studies may not have captured.
-
Offense-defense balance: OpenAI's threat assessment acknowledges planning for models reaching "High" cyber capability levels that could develop zero-day exploits or assist with complex intrusions. Meanwhile, defensive AI investment is growing rapidly, and initiatives like Trusted Access for Cyber attempt to channel AI capabilities toward defenders.
-
Regulatory approach: The U.S. DoD favors governance frameworks over bans for LAWS, while 166 UN member states voted for a resolution calling for action. China distinguishes "acceptable" from "unacceptable" autonomous weapons. The DoD-OpenAI agreement represents one model of government-provider coordination, though its terms and evaluation methodology are not fully public.
-
Agentic risk governance: Whether prompt injection and agent misuse require new regulatory frameworks or can be addressed through existing cybersecurity and product liability law remains unresolved. Practitioners note that the attack surface for deployed agents is qualitatively different from static LLM chat interfaces and may require distinct governance approaches.
-
Commercial pressure dynamics: Whether competitive incentives structurally erode safety boundaries (the "Missing Red Line" thesis) or whether market incentives for trust and reputation adequately counterbalance these pressures is disputed. The Sycophancy rollback provides one data point in favor of provider self-correction capacity, while the broader pattern of capability advancement under commercial pressure remains a live concern.
Key Sources and References
Primary Research Sources
| Source | Organization | Key Publications | Focus Area |
|---|---|---|---|
| RAND Corporation | Independent research | Biological Red-Team Study (2024); Global Risk Index (2024) | Bioweapons, defense |
| Georgetown CSET | University research center | Malicious Use Assessment Framework; Mechanisms of AI Harm (2025) | Policy, misuse assessment |
| OpenAI | AI lab | Cyber Resilience Report (2025); Threat Assessment; Preparedness Framework; Disruption Reports | Cyber, capabilities, influence ops |
| Microsoft | Technology company | Digital Defense Report (2025) | Cyber threats, state actors |
| CNAS | Think tank | AI and National Security Reports | Military, policy |
International Governance Sources
| Source | Focus | Key Documents |
|---|---|---|
| UN CCW GGE on LAWS | Autonomous weapons | Rolling text (Nov 2024); 2025 session schedules |
| ICRC | International humanitarian law | Autonomous Weapons Position Papers |
| Congressional Research Service | US policy | LAWS Policy Primer |
| ASIL | International law | Treaty Momentum Analysis (2025) |
Deepfake and Disinformation Sources
| Source | Focus | Key Findings |
|---|---|---|
| Deepstrike Research | Statistics | 8M deepfakes projected (2025); 550% growth (2019-2024) |
| UNESCO | Detection | 24.5% human detection accuracy; 0.1% reliable identification |
| European Parliament | Policy | Europol 90% synthetic content projection by 2026 |
| C2PA Coalition | Provenance | Content authenticity standards |
| Deloitte Financial Services | Financial impact | $12.3B to $10B fraud projection (2023-2027) |
References
RAND Corporation is a nonprofit research organization providing objective analysis and policy recommendations across a wide range of topics including national security, technology, governance, and emerging risks. It produces influential studies on AI policy, cybersecurity, and global governance challenges. RAND's work is frequently cited by governments and policymakers worldwide.
This RAND Corporation research report examines the risk of AI systems providing meaningful uplift to actors seeking to develop biological weapons, focusing on how to assess capability thresholds and decompose the problem for evaluation purposes. It likely provides a framework for analyzing when AI crosses dangerous capability boundaries in the bioweapons domain and how to structure risk assessments accordingly.
The EU AI Act is the world's first comprehensive legal framework for artificial intelligence, establishing a risk-based classification system for AI applications. It imposes varying obligations on developers and deployers depending on the risk level of their AI systems, from minimal-risk to unacceptable-risk categories. The act sets precedents for global AI governance and compliance requirements.
This CNAS report examines how AI advancements intersect with biosecurity risks, analyzing threats from state actors, nonstate actors, and accidental releases. It assesses whether fears about AI-enabled bioweapons are warranted and provides actionable policy recommendations to mitigate catastrophic biological threats.
MITRE ATT&CK is a globally accessible, open knowledge base cataloging adversary tactics and techniques based on real-world observations. It provides a structured matrix of attack behaviors across enterprise, mobile, and ICS environments, used by defenders, researchers, and policymakers to build threat models and improve cybersecurity defenses.
CNAS is a Washington D.C.-based national security think tank publishing research on defense, technology policy, economic security, and AI governance. Its Technology & National Security program produces policy-relevant work on AI, cybersecurity, and emerging technologies with implications for AI safety and governance.
The FBI Cyber Division homepage outlines the bureau's mission as the lead federal agency for investigating cyberattacks, including state-sponsored intrusions, ransomware, and critical infrastructure attacks. It describes major threat categories and provides resources for reporting cybercrime and understanding nation-state cyber threats.
The Center for Strategic and International Studies (CSIS) Strategic Technologies Program analyzes the intersection of technology, national security, and international competition. It produces policy analysis on topics including AI governance, cybersecurity, and emerging technologies with geopolitical implications. The program informs policymakers and the public on technology strategy and regulation.
CyberSeek is an interactive tool providing detailed data on the cybersecurity job market, including supply and demand metrics, career pathways, and workforce gaps. It helps job seekers, employers, and policymakers understand the cybersecurity talent landscape in the United States. The platform is funded by the National Initiative for Cybersecurity Education (NICE) and developed by CompTIA and Burning Glass Technologies.
The Global Terrorism Database (GTD) is an open-source database maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, containing information on over 200,000 terrorist attacks worldwide from 1970 through the present. It is the most comprehensive unclassified database on terrorist events in the world, providing detailed information on each incident including date, location, weapons used, nature of the target, and casualties. The GTD serves as a critical resource for researchers and policymakers studying political violence and terrorism trends.
OpenAI's official usage policies outline the rules and restrictions governing how its AI models and APIs may be used, including prohibited use cases and safety guidelines. The policies cover disallowed activities such as generating disinformation, facilitating influence operations, creating harmful content, and misusing AI for deceptive or dangerous purposes. These policies serve as a practical governance framework for responsible deployment of OpenAI's systems.
Anthropic introduces a novel approach to AI training called Constitutional AI, which uses self-critique and AI feedback to develop safer, more principled AI systems without extensive human labeling.
CSET (Center for Security and Emerging Technology) at Georgetown University is a policy research organization focused on the security implications of emerging technologies, particularly AI. It produces research on AI policy, workforce, geopolitics, and governance. The content could not be fully extracted, limiting detailed analysis.
The AI Incident Database is a publicly accessible repository cataloging real-world failures, harms, and unintended consequences caused by deployed AI systems. It serves as an empirical record to help researchers, policymakers, and developers learn from past mistakes and improve AI safety practices. The database enables systematic study of AI failure modes across industries and applications.
A statistics-focused overview of the deepfake landscape in 2025, covering prevalence, growth trends, and impact of synthetic media on trust and disinformation. The resource likely compiles data points relevant to understanding the scale of AI-generated deception and its societal risks.
The C2PA is an industry coalition that has developed an open technical standard for attaching verifiable provenance metadata to digital content, functioning like a 'nutrition label' that tracks a file's origin, creation tools, and edit history. This standard aims to help consumers and platforms distinguish authentic content from manipulated or AI-generated media. It is backed by major technology and media companies including Adobe, Microsoft, and the BBC.
This Congressional Research Service primer explains U.S. policy on lethal autonomous weapon systems (LAWS), clarifying that U.S. policy does not prohibit their development or employment. It covers the strategic rationale for LAWS, international pressure for restrictions, and the tensions between military utility and ethical/legal concerns. Updated through January 2025, it references Section 1066 of the FY2025 NDAA.
This ASIL Insight analyzes the December 2024 UN General Assembly resolution on lethal autonomous weapons systems (LAWS), which passed 166-3, and examines momentum toward a new international treaty. It outlines the typology of autonomous weapons (semi-, supervised-, and fully autonomous), existing international frameworks, and the debate over prohibiting versus regulating LAWS.
RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on AI risks, military applications, and regulatory frameworks from one of the leading U.S. defense and policy think tanks.
This page covers the 2025 meeting sessions of the UN Convention on Certain Conventional Weapons (CCW) Group of Governmental Experts (GGE) on Lethal Autonomous Weapons Systems (LAWS). These intergovernmental meetings are the primary multilateral forum for debating international norms, regulations, and potential prohibitions on autonomous weapons. They represent the current state of international diplomacy on AI-driven military systems.