AI Proliferation Risk Model
AI Proliferation Risk Model
Quantitative model of AI capability diffusion across 5 actor tiers, documenting compression from 24-36 months (2020) to 12-18 months (2024) with projections of 6-12 months by 2025-2026. Identifies compute governance (70-85% effectiveness) and pre-deployment gates (60-80%) as highest-leverage interventions before irreversible open-source proliferation, with specific actor-level risk calculations showing 5,000 expected misuse events at Tier 4-5 proliferation.
Overview
This model analyzes the diffusion of AI capabilities from frontier laboratories to progressively broader populations of actors. It examines proliferation mechanisms, control points, and the relationship between diffusion speed and risk accumulation. The central question: How fast do dangerous AI capabilities spread from frontier labs to millions of users, and which intervention points offer meaningful leverage?
Key findings show proliferation follows predictable tier-based patterns, but time constants are compressing dramatically. Capabilities that took 24-36 months to diffuse from Tier 1 (frontier labs) to Tier 4 (open source) in 2020 now spread in 12-18 months. Projections suggest 6-12 month cycles by 2025-2026, fundamentally changing governance calculus.
The model identifies an "irreversibility threshold" where proliferation cannot be reversed once capabilities reach open source. This threshold is crossed earlier than commonly appreciated—often before policymakers recognize capabilities as dangerous. High-leverage interventions must occur pre-proliferation; post-proliferation controls offer diminishing returns as diffusion accelerates.
Risk Assessment Framework
| Risk Dimension | Current Assessment | 2025-2026 Projection | Evidence | Trend |
|---|---|---|---|---|
| Diffusion Speed | High | Very High | 50% reduction in proliferation timelines since 2020 | Accelerating |
| Control Window | Medium | Low | 12-18 month average control periods | Shrinking |
| Actor Proliferation | High | Very High | Tier 4 access growing exponentially | Expanding |
| Irreversibility Risk | High | Extreme | Multiple capabilities already irreversibly proliferated | Increasing |
Proliferation Tier Analysis
Actor Tier Classification
The proliferation cascade operates through five distinct actor tiers, each with different access mechanisms, resource requirements, and risk profiles.
| Tier | Actor Type | Count | Access Mechanism | Diffusion Time | Control Feasibility |
|---|---|---|---|---|---|
| 1 | Frontier Labs | 5-10 | Original development | - | High (concentrated) |
| 2 | Major Tech | 50-100 | API/Partnerships | 6-18 months | Medium-High |
| 3 | Well-Resourced Orgs | 1K-10K | Fine-tuning/Replication | 12-24 months | Medium |
| 4 | Open Source | Millions | Public weights | 18-36 months | Very Low |
| 5 | Individuals | Billions | Consumer apps | 24-48 months | None |
Diagram (loading…)
flowchart TD T1[Tier 1: Frontier Labs<br/>OpenAI, Anthropic, Google, etc.<br/>~10 actors] --> T2[Tier 2: Major Tech<br/>Microsoft, Amazon, Meta<br/>~100 actors] T2 --> T3[Tier 3: Well-Resourced Orgs<br/>Large corps, governments<br/>~10,000 actors] T3 --> T4[Tier 4: Open Source<br/>Public model weights<br/>Millions of actors] T4 --> T5[Tier 5: Consumer Access<br/>Apps and services<br/>Billions of users] style T1 fill:#ff9999 style T2 fill:#ffcc99 style T3 fill:#fff4cc style T4 fill:#99ff99 style T5 fill:#99ccff
Historical Diffusion Data
Analysis of actual proliferation timelines reveals accelerating diffusion across multiple capability domains:
| Capability | Tier 1 Date | Tier 4 Date | Total Time | Key Events |
|---|---|---|---|---|
| GPT-3 level | May 2020 | Jul 2022 | 26 months | OpenAI → HuggingFace release |
| DALL-E level | Jan 2021 | Aug 2022 | 19 months | OpenAI → Stable Diffusion |
| GPT-4 level | Mar 2023 | Jan 2025 | 22 months | OpenAI → DeepSeek-R1 |
| Code generation | Aug 2021 | Dec 2022 | 16 months | Codex → StarCoder |
| Protein folding | Nov 2020 | Jul 2021 | 8 months | AlphaFold → ColabFold |
Mathematical Model
Core Risk Equation
Total proliferation risk combines actor count, capability level, and misuse probability:
Where:
- = Number of actors in tier with access at time
- = Capability level accessible to tier at time
- = Per-actor misuse probability for tier
Diffusion Dynamics
Each tier transition follows modified logistic growth with accelerating rates:
The acceleration factor captures increasing diffusion speed:
With per year, implying diffusion rates double every ~5 years. This matches observed compression from 24-36 month cycles (2020) to 12-18 months (2024).
Control Point Effectiveness
High-Leverage Interventions
| Control Point | Effectiveness | Durability | Implementation Difficulty | Current Status |
|---|---|---|---|---|
| Compute governance | 70-85% | 5-15 years | High | Partial (US export controls)↗🏛️ government★★★★☆Bureau of Industry and SecurityPartial (US export controls)Relevant to AI governance discussions around compute controls; BIS export restrictions on advanced chips (e.g., A100, H100 rules) are a key policy lever in slowing adversarial AI development and shaping the global compute landscape.This U.S. Bureau of Industry and Security (BIS) page provides regulatory guidance on export controls relevant to semiconductors and advanced technologies, administered in the in...governancepolicycomputecapabilities+2Source ↗ |
| Pre-deployment gates | 60-80% | Unknown | Very High | Voluntary only↗🏛️ government★★★★☆White HouseBiden-Harris Administration Secures Voluntary AI Safety Commitments from Leading AI Companies (July 2023)A key early Biden administration governance document illustrating voluntary industry self-regulation efforts; relevant to debates about whether soft commitments are sufficient for AI safety or whether binding regulation is needed.The White House announced voluntary commitments from major AI companies (including Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI) to manage AI risks, coveri...governancepolicycoordinationai-safety+4Source ↗ |
| Weight security | 50-70% | Fragile | Medium | Industry standard emerging↗📄 paper★★★☆☆arXivIndustry standard emergingThis paper studies coupled non-linear oscillators with local coupling using phase approximation methods. While focused on dynamical systems theory, understanding complex coupled systems behavior is relevant to AI safety research on agent coordination and emergent behaviors in multi-agent systems.D. Estevez-Moya, E. Estevez-Rams, H. Kantz (2023)5 citations · ComputerThis paper investigates coupled non-linear oscillators with local (nearest-neighbor) coupling using phase approximation under weak coupling assumptions. The study focuses on cha...interpretabilityrisk-factordiffusioncontrolSource ↗ |
| International coordination | 40-70% | Medium | Very High | Early stages↗🏛️ government★★★★☆UK GovernmentAI Safety Summit 2023: Chair's Statement (Bletchley Park, 2 November 2023)This is the official chair's summary from the landmark Bletchley Park AI Safety Summit, a key primary source for understanding early international governmental coordination on frontier AI safety and governance frameworks.This is the official UK government Chair's statement from the inaugural AI Safety Summit held at Bletchley Park on 1-2 November 2023. The summit brought together international s...ai-safetygovernancepolicycoordination+3Source ↗ |
Medium-Leverage Interventions
| Control Point | Current Effectiveness | Key Limitation | Example Implementation |
|---|---|---|---|
| API controls | 40-60% | Continuous bypass development | OpenAI usage policies↗🔗 web★★★★☆OpenAIOpenAI Usage PoliciesThis is OpenAI's official usage policy document, relevant for understanding how a leading AI lab operationalizes deployment-time safety governance and what behaviors are treated as hard limits versus soft guidelines.OpenAI's official usage policies outline the rules and restrictions governing how its AI models and APIs may be used, including prohibited use cases and safety guidelines. The p...governancepolicydeploymentrisk-factor+5Source ↗ |
| Capability evaluation | 50-70% | May miss emergent capabilities | ARC Evals↗🔗 webMETR (Model Evaluation & Threat Research)METR (formerly ARC Evals) is a leading independent organization conducting pre-deployment capability evaluations for frontier AI labs; their work directly informs safety policies at OpenAI, Anthropic, and others.METR (formerly ARC Evals) conducts research and evaluations to assess the capabilities and risks of frontier AI systems, focusing on autonomous capabilities, AI R&D acceleration...evaluationai-safetycapabilitiesred-teaming+5Source ↗ |
| Publication norms | 30-50% | Competitive pressure to publish | FHI publication guidelines↗🔗 web★★★★☆Future of Humanity InstituteFHI publication guidelinesPublished by FHI's GovAI blog, this piece is relevant to discussions on responsible disclosure, dual-use research of concern, and how the ML community should govern the publication of sensitive or dangerous findings.This resource from Oxford's Future of Humanity Institute (FHI) and Centre for the Governance of AI outlines recommended publication norms for machine learning researchers, addre...governancepolicyai-safetydeployment+5Source ↗ |
| Talent restrictions | 20-40% | Limited in free societies | CFIUS review process↗🏛️ governmentCFIUS review processCFIUS is a key U.S. policy mechanism for restricting foreign—especially adversarial nation—control over AI and critical technology assets; relevant to discussions of AI governance, compute security, and technology diffusion risks.CFIUS is a U.S. interagency committee that reviews foreign investment transactions and real estate purchases for national security implications under the Defense Production Act ...governancepolicycomputedeployment+2Source ↗ |
Proliferation Scenarios
2025-2030 Trajectory Analysis
| Scenario | Probability | Tier 1-4 Time | Key Drivers | Risk Level |
|---|---|---|---|---|
| Accelerating openness | 35% | 3-6 months | Open-source ideology, regulation failure | Very High |
| Current trajectory | 40% | 6-12 months | Mixed open/closed, partial regulation | High |
| Managed deceleration | 15% | 12-24 months | International coordination, major incident | Medium |
| Effective control | 10% | 24+ months | Strong compute governance, industry agreement | Low-Medium |
Threshold Analysis
Critical proliferation thresholds mark qualitative shifts in control feasibility:
| Threshold | Description | Control Status | Response Window |
|---|---|---|---|
| Contained | Tier 1-2 only | Control possible | Months |
| Organizational | Tier 3 access | State/criminal access likely | Weeks |
| Individual | Tier 4/5 access | Monitoring overwhelmed | Days |
| Irreversible | Open source + common knowledge | Control impossible | N/A |
Diagram (loading…)
graph LR A[Contained<br/>Tier 1-2] --> B[Organizational<br/>Tier 3] B --> C[Individual<br/>Tier 4-5] C --> D[Irreversible<br/>Open Source] A --> A1[Control possible<br/>Months to act] B --> B1[State actor access<br/>Weeks to act] C --> C1[Mass access<br/>Days to act] D --> D1[No control<br/>Focus on defense] style A fill:#ccffcc style B fill:#fff4cc style C fill:#ffcc99 style D fill:#ff9999
Risk by Actor Type
Misuse Probability Assessment
Different actor types present distinct risk profiles based on capability access and motivation:
| Actor Type | Estimated Count | Capability Access | P(Access) | P(Misuse|Access) | Risk Weight |
|---|---|---|---|---|---|
| Hostile state programs | 5-15 | Frontier | 0.95 | 0.15-0.40 | Very High |
| Major criminal orgs | 50-200 | Near-frontier | 0.70-0.85 | 0.30-0.60 | High |
| Terrorist groups | 100-500 | Moderate | 0.40-0.70 | 0.50-0.80 | High |
| Ideological groups | 1K-10K | Moderate | 0.50-0.80 | 0.10-0.30 | Medium |
| Malicious individuals | 10K-100K | Basic-Moderate | 0.60-0.90 | 0.01-0.10 | Medium (scale) |
Expected Misuse Events
Even low individual misuse probabilities become concerning at scale:
For Tier 4-5 proliferation with 100,000 capable actors and 5% misuse probability, expected annual misuse events: 5,000.
Current State & Trajectory
Recent Developments
The proliferation landscape has shifted dramatically since 2023:
2023 Developments:
- LLaMA leak↗🔗 webMeta's LLaMA Language Model Leaks Online, Raising Misuse ConcernsA key real-world case study in AI governance illustrating the difficulty of controlling model diffusion once weights are distributed, relevant to debates about open-source release policies and proliferation risk.Meta's LLaMA large language model, initially released only to approved researchers, was leaked publicly on 4chan and spread across the internet. The incident raised significant ...governanceopen-sourcedeploymentcapabilities+4Source ↗ demonstrated fragility of controlled releases
- LLaMA 2 open release↗🔗 web★★★★☆Meta AIMeta Llama 2 open-sourceMeta's Llama models are a leading open-source AI system relevant to AI safety discussions around open-weight model risks, deployment governance, and the implications of widely accessible frontier-capable models.Meta's Llama is a family of open-source large language models including Llama 3 and Llama 4 variants, offering multimodal capabilities, extended context windows, and various mod...capabilitiesopen-sourcedeploymentevaluation+3Source ↗ established new norm for frontier model sharing
- U.S. export controls↗🏛️ government★★★★☆Bureau of Industry and SecurityPartial (US export controls)Relevant to AI governance discussions around compute controls; BIS export restrictions on advanced chips (e.g., A100, H100 rules) are a key policy lever in slowing adversarial AI development and shaping the global compute landscape.This U.S. Bureau of Industry and Security (BIS) page provides regulatory guidance on export controls relevant to semiconductors and advanced technologies, administered in the in...governancepolicycomputecapabilities+2Source ↗ on advanced semiconductors implemented
2024-2025 Developments:
- DeepSeek R1 release↗🔗 web★★★☆☆GitHubDeepSeek-R1: Open-Source Reasoning Model ReleaseDeepSeek-R1's release in early 2025 was a landmark event in open-source AI, prompting significant debate in the AI safety community about the implications of freely available frontier reasoning models and the viability of compute-based governance approaches.DeepSeek-R1 is an open-source large language model from DeepSeek-AI that achieves strong reasoning capabilities through reinforcement learning, reportedly matching or approachin...capabilitiesopen-sourceevaluationai-safety+2Source ↗ achieved GPT-4 level performance with open weights
- Qwen 2.5↗🔗 webQwen 2.5 - Alibaba's Large Language Model SeriesQwen 2.5 is a frontier model series from Alibaba, relevant to AI safety discussions around capability proliferation, open-weight model risks, and the geopolitical landscape of advanced AI development outside Western labs.Qwen 2.5 is Alibaba's latest series of large language models, representing significant capability advances across language understanding, coding, mathematics, and multimodal tas...capabilitiesdeploymentevaluationgovernance+1Source ↗ and Mistral↗🔗 webFrontier AI LLMs, assistants, agents, services | Mistral AIMistral AI is referenced in AI safety contexts primarily regarding open-weight model release risks, diffusion of capable models, and European AI governance debates; this is their corporate homepage.Mistral AI is a European AI company developing frontier large language models, assistants, and AI services. They offer both open-weight models and commercial API products, posit...capabilitiesdeploymentgovernancediffusion+4Source ↗ continued aggressive open-source strategy
- Chinese labs increasingly releasing frontier capabilities openly
2025-2030 Projections
Accelerating Factors:
- Algorithmic efficiency reducing compute requirements ~2x annually
- China developing domestic chip capabilities to circumvent controls
- Open-source ideology gaining ground in AI community
- Economic incentives for ecosystem building through open models
Decelerating Factors:
- Growing awareness of proliferation risks among frontier labs
- Potential regulatory intervention following AI incidents
- Voluntary industry agreements on responsible disclosure
- Technical barriers to replicating frontier training runs
Critical Unknown Parameters
| Uncertainty | Impact on Model | Current State | Resolution Timeline |
|---|---|---|---|
| Chinese chip development | Very High | 2-3 generations behind | 3-7 years |
| Algorithmic efficiency gains | High | ≈2x annual improvement | Ongoing |
| Open vs closed norms | Very High | Trending toward open | 1-3 years |
| Regulatory intervention | High | Minimal but increasing | 2-5 years |
| Major AI incident | Very High | None yet | Unpredictable |
Model Sensitivity Analysis
The model is most sensitive to three parameters:
Diffusion Rate Acceleration (α): 10% change in α yields 25-40% change in risk estimates over 5-year horizon. This parameter depends heavily on continued algorithmic progress and open-source community growth.
Tier 4/5 Misuse Probability: Uncertainty ranges from 1-15% create order-of-magnitude differences in expected incidents. Better empirical data on malicious actor populations is critical.
Compute Control Durability: Estimates ranging from 3-15 years until circumvention dramatically affect intervention value. China's semiconductor progress is the key uncertainty.
Policy Implications
Immediate Actions (0-18 months)
Strengthen Compute Governance:
- Expand semiconductor export controls to cover training and inference chips
- Implement cloud provider monitoring for large training runs
- Establish international coordination on chip supply chain security
Establish Evaluation Frameworks:
- Define dangerous capability thresholds with measurable criteria
- Create mandatory pre-deployment evaluation requirements
- Build verification infrastructure for model capabilities
Medium-Term Priorities (18 months-5 years)
International Coordination:
- Negotiate binding agreements on proliferation control
- Establish verification mechanisms for training run detection
- Create sanctions framework for violating proliferation norms
Industry Standards:
- Implement weight security requirements for frontier models
- Establish differential access policies based on actor verification
- Create liability frameworks for irresponsible proliferation
Long-Term Structural Changes (5+ years)
Governance Architecture:
- Build adaptive regulatory systems that evolve with technology
- Establish international AI safety organization with enforcement powers
- Create sustainable funding for proliferation monitoring infrastructure
Research Priorities:
- Develop better offensive-defensive balance understanding
- Create empirical measurement systems for proliferation tracking
- Build tools for post-proliferation risk mitigation
Research Gaps
Several critical uncertainties limit model precision and policy effectiveness:
Empirical Proliferation Tracking: Systematic measurement of capability diffusion timelines across domains remains limited. Most analysis relies on high-profile case studies rather than comprehensive data collection.
Reverse Engineering Difficulty: Time and resources required to replicate capabilities from limited information varies dramatically across capability types. Better understanding could inform targeted protection strategies.
Actor Intent Modeling: Current misuse probability estimates rely on theoretical analysis rather than empirical study of malicious actor populations and motivations.
Control Mechanism Effectiveness: Rigorous testing of governance interventions is lacking. Most effectiveness estimates derive from analogies to other domains rather than AI-specific validation.
Defensive Capability Development: The model focuses on capability proliferation while ignoring parallel development of defensive tools that could partially offset risks.
Sources & Resources
Academic Research
| Source | Focus | Key Findings | Link |
|---|---|---|---|
| Heim et al. (2023)↗🔗 web★★★★☆CSET GeorgetownAI Chips and Geopolitics (Heim et al., 2023)Published by the Center for Security and Emerging Technology (CSET), this report is relevant to wiki users interested in compute governance, AI diffusion risks, and how semiconductor export controls intersect with AI safety policy.This CSET report examines the geopolitical dimensions of AI chip supply chains, analyzing how control over semiconductor manufacturing and export creates strategic leverage in A...governancecomputepolicycapabilities+3Source ↗ | Compute governance | Export controls 60-80% effective short-term | CSET Georgetown |
| Anderljung et al. (2023)↗📄 paper★★★☆☆arXivIndustry standard emergingThis paper studies coupled non-linear oscillators with local coupling using phase approximation methods. While focused on dynamical systems theory, understanding complex coupled systems behavior is relevant to AI safety research on agent coordination and emergent behaviors in multi-agent systems.D. Estevez-Moya, E. Estevez-Rams, H. Kantz (2023)5 citations · ComputerThis paper investigates coupled non-linear oscillators with local (nearest-neighbor) coupling using phase approximation under weak coupling assumptions. The study focuses on cha...interpretabilityrisk-factordiffusioncontrolSource ↗ | Model security | Weight protection reduces proliferation 50-70% | arXiv |
| Shavit et al. (2023)↗📄 paper★★★☆☆arXivShavit et al. (2023)An influential policy paper proposing concrete international governance structures for advanced AI; frequently cited in discussions of multilateral AI regulation and global coordination efforts.Lewis Ho, Joslyn Barnhart, Robert Trager et al. (2023)16 citationsThis paper proposes four complementary international institutional models for governing advanced AI: a Commission on Frontier AI for expert consensus, an Advanced AI Governance ...governanceai-safetypolicycoordination+4Source ↗ | Capability evaluation | Current evals miss 30-50% of dangerous capabilities | arXiv |
Policy Documents
| Document | Organization | Key Recommendations | Year |
|---|---|---|---|
| AI Executive Order↗🏛️ government★★★★☆White HouseBiden Administration AI Executive Order 14110This landmark 2023 US executive order was a major federal AI governance milestone; note the White House page may be unavailable as the order was rescinded by Executive Order on January 20, 2025 by the incoming Trump administration.Executive Order 14110, signed by President Biden on October 30, 2023, established comprehensive federal directives for AI safety, security, and governance in the United States. ...governancepolicyai-safetydeployment+5Source ↗ | White House | Mandatory reporting, evaluation requirements | 2023 |
| UK AI Safety Summit↗🏛️ government★★★★☆UK GovernmentAI Safety Summit 2023: Chair's Statement (Bletchley Park, 2 November 2023)This is the official chair's summary from the landmark Bletchley Park AI Safety Summit, a key primary source for understanding early international governmental coordination on frontier AI safety and governance frameworks.This is the official UK government Chair's statement from the inaugural AI Safety Summit held at Bletchley Park on 1-2 November 2023. The summit brought together international s...ai-safetygovernancepolicycoordination+3Source ↗ | UK Government | International coordination framework | 2023 |
| EU AI Act↗🔗 webEU AI Act – Official Resource HubThis is the primary information hub for the EU AI Act, the landmark 2024 EU regulation that sets legally binding rules for AI development and deployment across the European Union, directly relevant to AI safety governance and policy discussions.The EU AI Act is the world's first comprehensive legal framework for artificial intelligence, establishing a risk-based classification system for AI applications. It imposes var...governancepolicyai-safetydeployment+4Source ↗ | European Union | Risk-based regulatory approach | 2024 |
Technical Resources
| Resource | Type | Description | Access |
|---|---|---|---|
| Model weight leaderboards↗🔗 webModel weight leaderboardsThis archived leaderboard was a widely-cited reference for comparing open-source LLM capabilities; useful context for understanding how the field tracked model progress and the limitations of static benchmark-based evaluation.The Open LLM Leaderboard is a HuggingFace-hosted benchmarking platform that compares open-source large language models across standardized evaluations in a transparent and repro...capabilitiesevaluationai-safetydeployment+1Source ↗ | Data | Open-source capability tracking | HuggingFace |
| Compute trend analysis↗🔗 web★★★★☆Epoch AICompute trend analysisEpoch AI empirical analysis useful for understanding the economics of frontier AI development and informing governance discussions about compute thresholds and access concentration.This Epoch AI analysis tracks historical trends in the monetary cost of training machine learning systems, examining how dollar costs have evolved alongside compute scaling. It ...computecapabilitiesgovernanceai-safety+3Source ↗ | Analysis | Training cost trends over time | Epoch AI |
| Export control guidance↗🏛️ government★★★★☆Bureau of Industry and SecurityPartial (US export controls)Relevant to AI governance discussions around compute controls; BIS export restrictions on advanced chips (e.g., A100, H100 rules) are a key policy lever in slowing adversarial AI development and shaping the global compute landscape.This U.S. Bureau of Industry and Security (BIS) page provides regulatory guidance on export controls relevant to semiconductors and advanced technologies, administered in the in...governancepolicycomputecapabilities+2Source ↗ | Policy | Current semiconductor restrictions | BIS Commerce |
Related Models
| Model | Focus | Relationship |
|---|---|---|
| Racing Dynamics | Competitive pressures | Explains drivers of open release |
| Multipolar Trap | Coordination failures | Models governance challenges |
| Winner-Take-All | Market structure | Alternative to proliferation scenario |
References
METR (formerly ARC Evals) conducts research and evaluations to assess the capabilities and risks of frontier AI systems, focusing on autonomous capabilities, AI R&D acceleration potential, and evaluation integrity. They are notable for developing the 'time horizon' metric measuring how long AI agents can complete tasks, and for conducting pre-deployment evaluations for major AI labs.
The EU AI Act is the world's first comprehensive legal framework for artificial intelligence, establishing a risk-based classification system for AI applications. It imposes varying obligations on developers and deployers depending on the risk level of their AI systems, from minimal-risk to unacceptable-risk categories. The act sets precedents for global AI governance and compliance requirements.
This paper investigates coupled non-linear oscillators with local (nearest-neighbor) coupling using phase approximation under weak coupling assumptions. The study focuses on characterizing the 'needle region' in parameter space for Adler-type oscillators, where computation enhancement at the edge of chaos has been previously reported. The authors identify diverse dynamical behaviors within this region, including wave-like spatiotemporal patterns and heterogeneous dynamics revealed through entropic measures. The research demonstrates that spatial correlations emerge locally at the onset of chaos, with coherent oscillator clusters separated by disordered boundaries.
The Open LLM Leaderboard is a HuggingFace-hosted benchmarking platform that compares open-source large language models across standardized evaluations in a transparent and reproducible manner. It allows researchers and practitioners to filter, search, and rank models by performance metrics, providing a community reference for tracking AI capabilities progress. The leaderboard has since been archived, reflecting the rapid pace of LLM development.
Executive Order 14110, signed by President Biden on October 30, 2023, established comprehensive federal directives for AI safety, security, and governance in the United States. It required safety testing and reporting for frontier AI models, directed agencies to address AI risks across sectors including national security and civil rights, and aimed to position the US as a global leader in responsible AI development. The page content is currently unavailable, but the order is a landmark AI governance document.
6AI Safety Summit 2023: Chair's Statement (Bletchley Park, 2 November 2023)UK Government·Government▸
This is the official UK government Chair's statement from the inaugural AI Safety Summit held at Bletchley Park on 1-2 November 2023. The summit brought together international stakeholders to identify next steps for the safe development of frontier AI. It represents a landmark moment in international AI governance coordination, resulting in the Bletchley Declaration.
Meta's Llama is a family of open-source large language models including Llama 3 and Llama 4 variants, offering multimodal capabilities, extended context windows, and various model sizes for deployment across diverse use cases. The latest Llama 4 models feature native multimodality with early fusion architecture, supporting up to 10M token context windows. Models are freely downloadable and fine-tunable, positioning Llama as a major open-source alternative to proprietary AI systems.
Qwen 2.5 is Alibaba's latest series of large language models, representing significant capability advances across language understanding, coding, mathematics, and multimodal tasks. The series includes models of various sizes designed for both research and commercial deployment. It represents a major frontier model release from a leading Chinese AI lab.
This paper proposes four complementary international institutional models for governing advanced AI: a Commission on Frontier AI for expert consensus, an Advanced AI Governance Organization for safety standards, a Frontier AI Collaborative for equitable access, and an AI Safety Project for coordinated research. The framework draws on precedents from existing international organizations and aims to balance AI's benefits against global risks from powerful systems.
This resource from Oxford's Future of Humanity Institute (FHI) and Centre for the Governance of AI outlines recommended publication norms for machine learning researchers, addressing how and when to publish potentially dangerous AI capabilities research. It proposes frameworks for assessing dual-use risks and considering staged or restricted disclosure. The guidelines aim to balance scientific openness with responsible stewardship of potentially harmful information.
Mistral AI is a European AI company developing frontier large language models, assistants, and AI services. They offer both open-weight models and commercial API products, positioning themselves as a competitive alternative to US-based AI labs. Their work is relevant to AI safety discussions around model diffusion, open-source risks, and governance.
DeepSeek-R1 is an open-source large language model from DeepSeek-AI that achieves strong reasoning capabilities through reinforcement learning, reportedly matching or approaching OpenAI's o1 performance on reasoning benchmarks. The release includes model weights, technical details, and distilled smaller variants, representing a significant open-source milestone in frontier reasoning AI. Its release demonstrated that high-capability reasoning models can be developed at lower cost and made openly available.
This U.S. Bureau of Industry and Security (BIS) page provides regulatory guidance on export controls relevant to semiconductors and advanced technologies, administered in the interest of national security. It serves as a reference point for understanding how U.S. policy restricts the diffusion of critical technologies, including AI-relevant compute hardware, to adversarial or controlled entities.
Meta's LLaMA large language model, initially released only to approved researchers, was leaked publicly on 4chan and spread across the internet. The incident raised significant concerns about the ability to control access to powerful AI models once released, even in restricted form, and highlighted tensions between open research access and preventing misuse.
The White House announced voluntary commitments from major AI companies (including Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI) to manage AI risks, covering safety testing, information sharing, and transparency measures. These non-binding pledges represent the Biden administration's early governance approach before formal regulation, focusing on watermarking AI-generated content, red-teaming, and vulnerability reporting. Critics and analysts noted the limited enforceability of voluntary frameworks.
This Epoch AI analysis tracks historical trends in the monetary cost of training machine learning systems, examining how dollar costs have evolved alongside compute scaling. It provides empirical data on training cost trajectories to inform forecasts about future AI development economics and accessibility.
CFIUS is a U.S. interagency committee that reviews foreign investment transactions and real estate purchases for national security implications under the Defense Production Act of 1950. The Treasury Department is developing a Known Investor Program to streamline review for vetted investors from allied nations. A public Request for Information is open through March 2026 to gather feedback on process improvements.
OpenAI's official usage policies outline the rules and restrictions governing how its AI models and APIs may be used, including prohibited use cases and safety guidelines. The policies cover disallowed activities such as generating disinformation, facilitating influence operations, creating harmful content, and misusing AI for deceptive or dangerous purposes. These policies serve as a practical governance framework for responsible deployment of OpenAI's systems.
This CSET report examines the geopolitical dimensions of AI chip supply chains, analyzing how control over semiconductor manufacturing and export creates strategic leverage in AI competition between nations. It explores how chip restrictions and export controls shape the global diffusion of AI capabilities and influence international AI development trajectories.