Skip to content
Longterm Wiki
Navigation
Updated 2025-12-28HistoryData
Page StatusContent
Edited 3 months ago1.3k words6 backlinksUpdated quarterlyOverdue by 8 days
72QualityGood47ImportanceReference68ResearchModerate
Content8/13
SummaryScheduleEntityEdit historyOverview
Tables20/ ~5Diagrams1/ ~1Int. links83/ ~10Ext. links0/ ~6Footnotes0/ ~4References52/ ~4Quotes0Accuracy0RatingsN:6.5 R:7.5 A:7 C:8.5Backlinks6
Issues1
StaleLast edited 98 days ago - may need review
TODOs4
Complete 'Conceptual Framework' section
Complete 'Quantitative Analysis' section (8 placeholders)
Complete 'Strategic Importance' section
Complete 'Limitations' section (6 placeholders)

Capability Threshold Model

Analysis

AI Capability Threshold Model

Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons development at 40% by 2029, with critical thresholds estimated when models achieve 50% on complex reasoning benchmarks and cross expert-level domain knowledge. Provides concrete capability requirements, timeline projections, and early warning indicators across 7 major risk categories with extensive benchmark tracking.

Model TypeThreshold Analysis
ScopeCapability-risk mapping
Key InsightMany risks have threshold dynamics rather than gradual activation
Related
Analyses
AI Risk Activation Timeline ModelAI Risk Warning Signs ModelScheming Likelihood Assessment
1.3k words · 6 backlinks

Overview

Different AI risks require different capability levels to become dangerous. A system that can write convincing phishing emails poses different risks than one that can autonomously discover zero-day vulnerabilities. This model maps specific capability requirements to specific risks, helping predict when risks activate as capabilities improve.

The capability threshold model provides a structured framework for understanding how AI systems transition from relatively benign to potentially dangerous across multiple risk domains. Rather than treating AI capability as a single dimension or risks as uniformly dependent on general intelligence, this model recognizes that specific risks emerge when systems cross particular capability thresholds in relevant dimensions. According to the International AI Safety Report (October 2025), governance choices in 2025-2026 must internalize that capability scaling has decoupled from parameter count, meaning risk thresholds can be crossed between annual cycles.

Key findings include 15-25% benchmark performance indicating early risk emergence, 50% marking qualitative shifts to complex autonomous execution, and most critical thresholds estimated to cross between 2025-2029 across misuse, control, and structural risk categories. The Future of Life Institute's 2025 AI Safety Index reveals an industry struggling to keep pace with its own rapid capability advances, with companies claiming AGI achievement within the decade yet none scoring above D in existential safety planning.

Risk Impact Assessment

Risk CategorySeverityLikelihood (2025-2027)Threshold Crossing TimelineTrend
Authentication CollapseCritical85%2025-2027↗ Accelerating
Mass PersuasionHigh70%2025-2026↗ Accelerating
Cyberweapon DevelopmentHigh65%2025-2027↗ Steady
Bioweapons DevelopmentCritical40%2026-2029→ Uncertain
Situational AwarenessCritical60%2025-2027↗ Accelerating
Economic DisplacementHigh80%2026-2030↗ Steady
Strategic DeceptionExtreme15%2027-2035+→ Uncertain

Capability Dimensions Framework

AI capabilities decompose into five distinct dimensions that progress at different rates. Understanding these separately is crucial because different risks require different combinations. According to Epoch AI's tracking, the training compute of frontier AI models has grown by 5x per year since 2020, and the Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024, from ~8 points/year to ~15 points/year.

Diagram (loading…)
flowchart TD
  subgraph DIMS["Capability Dimensions"]
      DK[Domain Knowledge] --> RISK
      RD[Reasoning Depth] --> RISK
      PH[Planning Horizon] --> RISK
      SM[Strategic Modeling] --> RISK
      AE[Autonomous Execution] --> RISK
  end

  subgraph RISK["Risk Activation Thresholds"]
      AUTH[Authentication Collapse<br/>Threshold: 2025-2027]
      BIO[Bioweapons Uplift<br/>Threshold: 2026-2029]
      CYBER[Cyberweapons<br/>Threshold: 2025-2027]
      SCHEME[Strategic Deception<br/>Threshold: 2027-2035+]
  end

  RISK --> AUTH
  RISK --> BIO
  RISK --> CYBER
  RISK --> SCHEME

  style AUTH fill:#ffcccc
  style BIO fill:#ffcccc
  style CYBER fill:#ffddcc
  style SCHEME fill:#ffe6cc
DimensionLevel 1Level 2Level 3Level 4Current FrontierGap to Level 3
Domain KnowledgeUndergraduateGraduateExpertSuperhumanExpert- (some domains)0.5 levels
Reasoning DepthSimple (2-3 steps)Moderate (5-10)Complex (20+)SuperhumanModerate+0.5-1 level
Planning HorizonImmediateShort-term (hrs)Medium (wks)Long-term (months)Short-term+1 level
Strategic ModelingNoneBasicSophisticatedSuperhumanBasic+1-1.5 levels
Autonomous ExecutionNoneSimple tasksComplex tasksFull autonomySimple-Complex0.5-1 level

Domain Knowledge Benchmarks

Current measurement approaches show significant gaps in assessing practical domain expertise:

DomainBest BenchmarkCurrent Frontier ScoreExpert Human LevelAssessment Quality
BiologyMMLU-Biology85-90%≈95%Medium
ChemistryChemBench70-80%≈90%Low
Computer SecuritySecBench65-75%≈85%Low
PsychologyMMLU-Psychology80-85%≈90%Very Low
MedicineMedQA85-90%≈95%Medium

Assessment quality reflects how well benchmarks capture practical expertise versus academic knowledge.

Reasoning Depth Progression

The ARC Prize 2024-2025 results demonstrate the critical threshold zone for complex reasoning. On ARC-AGI-1, OpenAI's o3-preview achieved 75.7% accuracy (near human level of 98%), while on the harder ARC-AGI-2 benchmark, even advanced models score only single-digit percentages, yet humans can solve every task.

Reasoning LevelBenchmark ExamplesCurrent PerformanceRisk Relevance
Simple (2-3 steps)Basic math word problems95%+Low-risk applications
Moderate (5-10 steps)GSM8K, multi-hop QA85-95%Most current capabilities
Complex (20+ steps)ARC-AGI, extended proofs30-75% (ARC-AGI-1), 5-55% (ARC-AGI-2)Critical threshold zone
SuperhumanNovel mathematical proofs<10%Advanced risks

Recent breakthrough (December 2025): Poetiq with GPT-5.2 X-High achieved 75% on ARC-AGI-2, surpassing the average human test-taker score of 60% for the first time, demonstrating rapid progress on complex reasoning tasks.

Risk-Capability Mapping

Near-Term Risks (2025-2027)

Authentication Collapse

The volume of deepfakes has grown explosively: Deloitte's 2024 analysis estimates growth from roughly 500,000 online deepfakes in 2023 to about 8 million in 2025, with annual growth nearing 900%. Voice cloning has crossed what experts call the "indistinguishable threshold"--a few seconds of audio now suffice to generate a convincing clone.

CapabilityRequired LevelCurrent LevelGapEvidence
Domain Knowledge (Media)ExpertExpert-0.5 levelSora quality approaching photorealism
Reasoning DepthModerateModerate0 levelsCurrent models handle multi-step generation
Strategic ModelingBasic+Basic0.5 levelLimited theory of mind in current systems
Autonomous ExecutionSimpleSimple0 levelsAlready achieved for content generation

Key Threshold Capabilities:

  • Generate synthetic content indistinguishable from authentic across all modalities
  • Real-time interactive video generation (NVIDIA Omniverse)
  • Defeat detection systems designed to identify AI content
  • Mimic individual styles from minimal samples

Detection Challenges: OpenAI's deepfake detection tool identifies DALL-E 3 images with 98.8% accuracy but only flags 5-10% of images from other AI tools. Multi-modal attacks combining deepfaked video, synthetic voices, and fabricated documents are increasing.

Current Status: OpenAI's Sora and Meta's Make-A-Video demonstrate near-threshold video generation. ElevenLabs achieves voice cloning from <30 seconds of audio.

Mass Persuasion Capabilities

CapabilityRequired LevelCurrent LevelGapEvidence
Domain Knowledge (Psychology)Graduate+Graduate0.5 levelStrong performance on psychology benchmarks
Strategic ModelingSophisticatedBasic+1 levelLimited multi-agent reasoning
Planning HorizonMedium-termShort-term1 levelCannot maintain campaigns over weeks
Autonomous ExecutionSimpleSimple0 levelsCan generate content at scale

Research Evidence:

  • Anthropic (2024) shows Claude 3 achieves 84% on psychology benchmarks
  • Stanford HAI study finds AI-generated content 82% higher believability
  • MIT persuasion study demonstrates automated A/B testing improves persuasion by 35%

Medium-Term Risks (2026-2029)

Bioweapons Development

CapabilityRequired LevelCurrent LevelGapAssessment Source
Domain Knowledge (Biology)ExpertGraduate+1 levelRAND biosecurity assessment
Domain Knowledge (Chemistry)ExpertGraduate1-2 levelsLimited synthesis knowledge
Reasoning DepthComplexModerate+1 levelCannot handle 20+ step procedures
Planning HorizonMedium-termShort-term1 levelNo multi-week experimental planning
Autonomous ExecutionComplexSimple+1 levelCannot troubleshoot failed experiments

Critical Bottlenecks:

  • Specialized synthesis knowledge for dangerous compounds
  • Autonomous troubleshooting of complex laboratory procedures
  • Multi-week experimental planning and adaptation
  • Integration of theoretical knowledge with practical constraints

Expert Assessment: RAND Corporation (2024) estimates 60% probability of crossing threshold by 2028.

Economic Displacement Thresholds

McKinsey's research indicates that current technologies could automate about 57% of U.S. work hours in theory. By 2030, approximately 27% of current work hours in Europe and 30% in the United States could be automated. Workers in lower-wage jobs are up to 14 times more likely to need to change occupations than those in highest-wage positions.

Job CategoryAutomation ThresholdCurrent AI CapabilityEstimated TimelineSource
Content Writing70% task automation85%Crossed 2024McKinsey AI Index
Code Generation60% task automation60-70% (SWE-bench Verified)Crossed 2025SWE-bench leaderboard
Data Analysis75% task automation55%2026-2027Industry surveys
Customer Service80% task automation70%2025-2026Salesforce AI reports
Legal Research65% task automation40%2027-2028Legal industry analysis

Coding Benchmark Update: The International AI Safety Report (October 2025) notes that coding capabilities have advanced particularly quickly. Top models now solve over 60% of problems in SWE-bench Verified, up from 40% in late 2024 and almost 0% at the beginning of 2024. However, Scale AI's SWE-Bench Pro shows a significant performance drop: even the best models (GPT-5, Claude Opus 4.1) score only 23% on harder, more realistic tasks.

Long-Term Control Risks (2027-2035+)

Strategic Deception (Scheming)

CapabilityRequired LevelCurrent LevelGapUncertainty
Strategic ModelingSuperhumanBasic+2+ levelsVery High
Reasoning DepthComplexModerate+1 levelHigh
Planning HorizonLong-termShort-term2 levelsVery High
Situational AwarenessExpertBasic2 levelsHigh

Key Uncertainties:

  • Whether sophisticated strategic modeling can emerge from current training approaches
  • Detectability of strategic deception capabilities during evaluation
  • Minimum capability level required for effective scheming

Research Evidence:

  • Anthropic Constitutional AI shows limited success in detecting deceptive behavior
  • Redwood Research adversarial training reveals capabilities often hidden during evaluation

Current State & Trajectory

Capability Progress Rates

According to Epoch AI's analysis, training compute for frontier models grows 4-5x yearly. Their Epoch Capabilities Index shows frontier model improvement nearly doubled in 2024. METR's research shows AI performance on task length has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months.

Dimension2023-2024 ProgressProjected 2024-2025Key Drivers
Domain Knowledge+0.5 levels+0.3-0.7 levelsLarger training datasets, specialized fine-tuning
Reasoning Depth+0.3 levels+0.2-0.5 levelsChain-of-thought improvements, tree search
Planning Horizon+0.2 levels+0.2-0.4 levelsTool integration, memory systems
Strategic Modeling+0.1 levels+0.1-0.3 levelsMulti-agent training, RL improvements
Autonomous Execution+0.4 levels+0.3-0.6 levelsTool use, real-world deployment

Data Sources: Epoch AI capability tracking, industry benchmark results, expert elicitation.

Compute Scaling Projections

MetricCurrent (2025)Projected 2027Projected 2030Source
Models above 10^26 FLOP≈5-10≈30≈200+Epoch AI model counts
Largest training run power1-2 GW2-4 GW4-16 GWEpoch AI power analysis
Frontier model training cost$100M-500M$100M-1B+$1-5BEpoch AI cost projections
Open-weight capability lag6-12 months6-12 months6-12 monthsEpoch AI consumer GPU analysis

Leading Organizations

OrganizationStrongest CapabilitiesEstimated Timeline to Next ThresholdFocus Area
OpenAIDomain knowledge, autonomous execution12-18 monthsGeneral capabilities
AnthropicReasoning depth, strategic modeling18-24 monthsSafety-focused development
DeepMindStrategic modeling, planning18-30 monthsScientific applications
MetaMultimodal generation6-12 monthsSocial/media applications

Key Uncertainties & Research Cruxes

Measurement Validity

The Berkeley CLTC Working Paper on Intolerable Risk Thresholds notes that models effectively more capable than the latest tested model (4x or more in Effective Compute or 6 months worth of fine-tuning) require comprehensive assessment including threat model mapping, empirical capability tests, elicitation testing without safety mechanisms, and likelihood forecasting.

An interdisciplinary review of AI evaluation highlights the "benchmark lottery" problem: researchers at Google's Brain Team found that many factors other than fundamental algorithmic superiority may lead to a method being perceived as superior. Ironically, a majority of influential benchmarks have been released without rigorous peer review.

UncertaintyImpact if TrueImpact if FalseCurrent Evidence
Current benchmarks accurately measure risk-relevant capabilitiesCan trust threshold predictionsNeed fundamentally new evaluationsMixed - good for some domains, poor for others
Practical capabilities match benchmark performanceSmooth transition from lab to deploymentSignificant capability overhangsSubstantial gaps observed in real-world deployment
Capability improvements follow predictable scaling lawsReliable timeline forecasting possibleThreshold crossings may surpriseScaling laws hold for some capabilities, not others

Threshold Sharpness

Sharp Threshold Evidence:

  • Authentication systems: Detection accuracy drops from 95% to 15% once generation quality crosses threshold
  • Economic viability: McKinsey automation analysis shows 10-20% capability improvements create 50-80% cost advantage in many tasks
  • Security vulnerabilities: Most exploits require complete capability to work at all

Gradual Scaling Evidence:

  • Job displacement: Different tasks within roles automate at different rates
  • Persuasion effectiveness: Incremental improvements in messaging quality yield incremental persuasion gains
  • Domain expertise: Knowledge accumulation appears continuous rather than threshold-based

Strategic Deception Detection

Critical unsolved problems in capability assessment:

ChallengeCurrent ApproachLimitationResearch Need
Detecting hidden capabilitiesRed-teaming, adversarial promptsSystems may hide abilities during evaluationNovel evaluation paradigms
Measuring situational awarenessDirect questioning, inference testsPattern matching vs. genuine understandingCausal intervention studies
Long-term planning assessmentMulti-step benchmarksMaximum tested horizon ~hoursExtended evaluation infrastructure
Strategic modeling depthTheory of mind tasksUnclear if results generalizeEcological validity research

Timeline Projections & Warning Indicators

2025 Critical Thresholds

High Probability (70%+):

  • Authentication Collapse: Real-time deepfakes become commercially viable
  • Content Generation Saturation: Human-level quality across all text/image modalities

Medium Probability (40-70%):

  • Code Automation: 50%+ of software engineering tasks automated
  • Basic Situational Awareness: Systems understand evaluation vs. deployment contexts

2026-2027 Medium Probability Events

RiskProbabilityKey Indicators to Monitor
Advanced Cyberweapons65%METR evaluations crossing 40% threshold
Economic Disruption80%GitHub Copilot achieving >60% task completion
Bioweapons Uplift40%Expert biology benchmarks >90%, synthesis knowledge tests
Oversight Evasion60%Detection of training/deployment context awareness

Early Warning System

Red Flag Indicators:

  • Sudden benchmark improvements >20 percentage points
  • Systems developing capabilities not explicitly trained for
  • Gap between capability and safety evaluation results widening
  • Evidence of strategic behavior during evaluation

Monitoring Infrastructure:

  • METR dangerous capability evaluations
  • MIRI alignment evaluation protocols
  • Industry responsible scaling policies (OpenAI Preparedness, Anthropic RSP)
  • Academic capability forecasting (Epoch AI)

The METR Common Elements Report (December 2025) describes how each major AI developer's policy uses capability thresholds for biological weapons development, cyberattacks, autonomous replication, and automated AI R&D, with commitments to conduct model evaluations assessing whether models are approaching thresholds that could enable severe harm.

Expert Survey Findings

An OECD-affiliated survey on AI thresholds found that experts agreed if training compute thresholds are exceeded, AI companies should:

  • Conduct additional risk assessments (e.g., via model evaluations)
  • Notify an independent public body (e.g., EU AI Office, FTC, or AI Safety Institute)
  • Notify the government

Participants noted that risk assessment frameworks from safety-critical industries (nuclear, maritime, aviation, healthcare, finance, space) provide valuable precedent for AI governance.

Sources & Resources

Primary Research

SourceTypeKey FindingsRelevance
Anthropic Responsible Scaling PolicyIndustry PolicyDefines capability thresholds for safety measuresFramework implementation
OpenAI Preparedness FrameworkIndustry PolicyRisk assessment methodologyThreshold identification
METR Dangerous Capability EvaluationsResearchSystematic capability testingCurrent capability baselines
Epoch AI Capability ForecastsResearchTimeline predictions for AI milestonesForecasting methodology

Government & Policy

OrganizationResourceFocus
NIST AI Risk Management FrameworkUS GovernmentRisk assessment standards
UK AISI ResearchUK GovernmentModel evaluation protocols
EU AI OfficeEU GovernmentRegulatory frameworks
RAND Corporation AI StudiesThink TankNational security implications

Technical Benchmarks & Evaluation

BenchmarkDomainCurrent Frontier Score (Dec 2025)Threshold Relevance
MMLUGeneral Knowledge85-90%Domain expertise baseline
ARC-AGI-1Abstract Reasoning75-87% (o3-preview)Complex reasoning threshold
ARC-AGI-2Abstract Reasoning54-75% (GPT-5.2)Next-gen reasoning threshold
SWE-bench VerifiedSoftware Engineering60-70%Autonomous code execution
SWE-bench ProReal-world Coding17-23%Generalization to novel code
MATHMathematical Reasoning60-80%Multi-step reasoning

Risk Assessment Research

Research AreaKey PapersOrganizations
Bioweapons RiskRAND Biosecurity AssessmentRAND, Johns Hopkins CNAS
Economic DisplacementMcKinsey AI ImpactMcKinsey, Brookings Institution
Authentication CollapseDeepfake Detection ChallengesUC Berkeley, MIT
Strategic DeceptionConstitutional AI ResearchAnthropic, Redwood Research

Additional Sources

SourceTypeKey Finding
International AI Safety Report (Oct 2025)GovernmentRisk thresholds can be crossed between annual cycles due to post-training/inference advances
Future of Life Institute AI Safety Index 2025NGOIndustry fundamentally unprepared; Anthropic leads (C+) but none score above D in existential safety
Berkeley CLTC Intolerable Risk ThresholdsAcademicModels 4x+ more capable require comprehensive risk assessment
METR Common Elements Report (Dec 2025)ResearchAll major labs use capability thresholds for bio, cyber, replication, AI R&D
ARC Prize 2025 ResultsAcademicFirst AI system (Poetiq/GPT-5.2) exceeds human average on ARC-AGI-2 reasoning
Epoch AI Compute TrendsResearchTraining compute grows 4-5x yearly; capability improvement doubled in 2024

References

OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial general intelligence (AGI) benefits all of humanity. The homepage serves as a gateway to their research, products, and policy work spanning capabilities and safety.

★★★★☆
2[2009.03300] Measuring Massive Multitask Language UnderstandingarXiv·Dan Hendrycks et al.·2020·Paper

Introduces the MMLU benchmark, a comprehensive evaluation suite covering 57 subjects across STEM, humanities, social sciences, and more, designed to measure breadth and depth of language model knowledge. The benchmark tests models from elementary to professional level and reveals significant gaps between human expert performance and state-of-the-art models at the time of publication. It became a standard benchmark for tracking LLM capability progress.

★★★☆☆
3[2310.09049] SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication NetworkarXiv·Lei Yao, Yong Zhang, Zilong Yan & Jialu Tian·2023·Paper

This paper proposes SAI, a systematic AI framework for solving diverse AI tasks in communication networks by integrating large language models with structured reasoning approaches. It addresses how LLMs can be applied to network management and optimization problems through systematic decomposition of complex communication tasks. The work explores capability thresholds and risk assessment for AI deployment in critical network infrastructure.

★★★☆☆

Epoch AI analyzes how many AI models would fall above various compute thresholds (measured in FLOPs), providing empirical projections relevant to governance frameworks that use compute as a regulatory trigger. The analysis helps policymakers and researchers understand the practical scope and selectivity of compute-based oversight mechanisms.

★★★★☆

Google DeepMind is a leading AI research laboratory combining the former DeepMind and Google Brain teams, focused on developing advanced AI systems and conducting research across capabilities, safety, and applications. The organization is one of the most influential labs in AI development, working on frontier models including Gemini and publishing widely-cited safety and capabilities research.

★★★★☆
6RAND Corporation studyRAND Corporation·2024

This RAND Corporation research report examines the risk of AI systems providing meaningful uplift to actors seeking to develop biological weapons, focusing on how to assess capability thresholds and decompose the problem for evaluation purposes. It likely provides a framework for analyzing when AI crosses dangerous capability boundaries in the bioweapons domain and how to structure risk assessments accordingly.

★★★★☆

This page outlines the European Commission's comprehensive policy framework for AI, centered on promoting trustworthy, human-centric AI through the AI Act, AI Continent Action Plan, and Apply AI Strategy. It aims to balance Europe's global AI competitiveness with safety, fundamental rights, and democratic values. Key initiatives include AI Factories, the InvestAI Facility, GenAI4EU, and the Apply AI Alliance.

★★★★☆

Epoch AI is a research organization focused on investigating and forecasting trends in artificial intelligence, particularly around compute, training data, and algorithmic progress. They produce empirical analyses and datasets to inform understanding of AI development trajectories and support better decision-making in AI governance and safety.

★★★★☆

This OECD-affiliated survey examines how thresholds and capability benchmarks should be defined and applied to advanced AI systems for governance and risk management purposes. It likely synthesizes expert views on identifying dangerous capability levels and triggering regulatory or safety interventions. The work is relevant to policymakers and AI developers designing evaluation frameworks for frontier models.

Deloitte's 2024 analysis frames deepfakes as a cybersecurity-scale threat to online trust, projecting the deepfake detection market will grow 42% annually from $5.5B in 2023 to $15.7B by 2026. The report draws parallels to cybersecurity spending trajectories and highlights that costs of maintaining content authenticity will likely be distributed across consumers, creators, and advertisers. Consumer surveys reveal widespread skepticism and demand for standardized AI content labeling.

Meta's official AI homepage showcases their broad research and product portfolio including Llama 4 (large language models), Segment Anything Model 3 (computer vision), V-JEPA 2 (world models), and AI glasses hardware. The company organizes its AI work around four research pillars: Communication & Language, Embodiment & Actions, Alignment, and Core Learning & Reasoning. Meta emphasizes open-source development and practical deployment at scale.

★★★★☆
12ARC-AGI-2 Benchmarkarcprize.org

ARC-AGI-2 is a 2025 benchmark designed to stress-test AI reasoning systems, where pure LLMs score 0% and frontier reasoning systems achieve only single-digit percentages despite humans solving all tasks. It targets three core capability gaps—symbolic interpretation, compositional reasoning, and contextual rule application—demonstrating that scaling alone is insufficient and new architectures or test-time adaptation methods are required.

Sora is OpenAI's text-to-video generation model and app that converts text prompts or images into videos with high realism, including automatic sound. It supports features like character casting, remixing, and multiple visual styles including cinematic and photorealistic.

★★★★☆

Anthropic introduces its Responsible Scaling Policy (RSP), a framework of technical and organizational protocols for managing catastrophic risks as AI systems become more capable. The policy defines AI Safety Levels (ASL-1 through ASL-5+), modeled after biosafety level standards, requiring increasingly strict safety, security, and operational measures tied to a model's potential for catastrophic risk. Current Claude models are classified ASL-2, with ASL-3 and beyond triggering stricter deployment and security requirements.

★★★★☆

A McKinsey Global Institute report examining how AI agents and robotics are reshaping labor markets and workforce skills. The report reportedly finds that 57% of workers may need to develop new skill partnerships with AI systems, analyzing how human-AI collaboration will transform job roles and economic productivity.

★★★☆☆
16Redwood Research: AI Controlredwoodresearch.org

Redwood Research is a nonprofit AI safety organization that pioneered the 'AI control' research agenda, focusing on preventing intentional subversion by misaligned AI systems. Their key contributions include the ICML paper on AI Control protocols, the Alignment Faking demonstration (with Anthropic), and consulting work with governments and AI labs on misalignment risk mitigation.

SWE-bench is a benchmark and leaderboard platform for evaluating AI models on real-world software engineering tasks, particularly resolving GitHub issues in open-source Python repositories. It offers multiple dataset variants (Lite, Verified, Multimodal) and standardized metrics to compare coding agents. It has become a widely-used standard for assessing the practical software engineering capabilities of LLM-based agents.

METR is an organization conducting research and evaluations to assess the capabilities and risks of frontier AI systems, focusing on autonomous task completion, AI self-improvement risks, and evaluation integrity. They have developed the 'Time Horizon' metric measuring how long AI agents can autonomously complete software tasks, showing exponential growth over recent years. They work with major AI labs including OpenAI, Anthropic, and Amazon to evaluate catastrophic risk potential.

★★★★☆

The NIST AI RMF is a voluntary, consensus-driven framework released in January 2023 to help organizations identify, assess, and manage risks associated with AI systems while promoting trustworthiness across design, development, deployment, and evaluation. It provides structured guidance organized around core functions and is accompanied by a Playbook, Roadmap, and a Generative AI Profile (2024) addressing risks specific to generative AI systems.

★★★★★

GitHub Copilot is an AI-powered coding assistant that integrates into IDEs, terminals, and GitHub workflows to assist developers with code completion, autonomous agent-based coding tasks, and project management. It supports multiple LLMs and allows assignment of coding tasks to AI agents that can autonomously write code and create pull requests.

★★★☆☆

Epoch AI analyzes how consumer GPUs like the RTX 5090 can run open-weight models that match frontier LLM performance from 6-12 months prior. The analysis tracks this gap across multiple benchmarks (GPQA Diamond, MMLU-Pro, LMArena) and finds the democratization trend is driven by open-weight scaling, model distillation, and GPU progress.

★★★★☆

ElevenLabs is a leading AI voice technology platform offering text-to-speech, voice cloning, speech-to-text, and AI agent capabilities across 70+ languages. It serves enterprises, creators, and developers with tools for synthetic voice generation and audio content creation. The platform represents a prominent example of advanced synthetic media technology with significant implications for deepfakes, identity fraud, and information integrity.

23Authentication systemsarXiv·Huseyin Fuat Alsan & Taner Arsan·2023·Paper

This paper proposes a curriculum learning approach for post-disaster analytics using multimodal deep learning models that jointly process images and text. The authors introduce Dynamic Task and Weight Prioritization (DATWEP), a novel gradient-based curriculum learning method that automatically determines task difficulty during training without manual specification. The approach combines U-Net for semantic segmentation, image encoding, and a custom text classifier for visual question answering, evaluated on the FloodNet dataset for flood damage assessment.

★★★☆☆
24McKinsey State of AI in 2024McKinsey & Company

McKinsey's annual survey-based report tracking enterprise AI adoption, investment trends, and emerging risks across industries. The report provides quantitative benchmarks on how organizations are deploying AI, including generative AI, and what governance and risk management practices they are implementing.

★★★☆☆
25International AI Safety Report (October 2025)internationalaisafetyreport.org

A focused interim update to the International AI Safety Report, chaired by Yoshua Bengio, covering significant developments in AI capabilities and their risk implications between full annual editions. The report is produced by an international panel of experts from over 30 countries and aims to keep policymakers and researchers current on fast-moving AI developments. It serves as an authoritative, consensus-oriented reference for AI safety governance.

This resource surveys leading AI-powered deepfake detection tools available in 2025, including OpenAI's detection tool, evaluating their capabilities for identifying synthetically generated media. It serves as a practical reference for organizations and researchers assessing defenses against AI-generated disinformation and identity fraud. The piece highlights the growing ecosystem of countermeasures to synthetic media threats.

Epoch AI analyzes historical trends in the training compute used for frontier AI models, finding that compute has grown approximately 4-5x per year. This rapid scaling has significant implications for AI capabilities trajectories, resource requirements, and safety planning horizons.

★★★★☆
28AI Safety Institute - GOV.UKUK Government·Government

The UK AI Safety Institute (recently rebranded as the AI Security Institute) is a government body under the Department for Science, Innovation and Technology focused on minimizing risks from rapid and unexpected AI advances. It conducts and publishes safety research, international coordination reports, and policy guidance, while managing grants for systemic AI safety research.

★★★★☆

Anthropic's official model card for the Claude 3 family (Haiku, Sonnet, Opus), documenting capability evaluations, safety assessments, and alignment properties. It covers frontier model benchmarks, red-teaming results, and responsible scaling policy (RSP) threshold evaluations for biological, chemical, and other catastrophic risks. The document represents Anthropic's public transparency effort around deploying a state-of-the-art AI system.

★★★★☆

MIRI is a nonprofit research organization focused on ensuring that advanced AI systems are safe and beneficial. It conducts technical research on the mathematical foundations of AI alignment, aiming to solve core theoretical problems before transformative AI is developed. MIRI is one of the pioneering organizations in the AI safety field.

★★★☆☆

METR presents updated capability evaluations for Claude Sonnet and OpenAI o1 models, assessing whether these frontier AI systems approach autonomy thresholds relevant to safety and deployment decisions. The evaluations focus on task autonomy and the potential for models to pose novel risks as their capabilities scale.

★★★★☆

Meta's Make-A-Video is an AI system that generates short video clips from text descriptions, images, or existing videos. It extends text-to-image generation techniques into the temporal domain, enabling creation of realistic and imaginative video content from natural language prompts. The system represents a significant capability milestone in generative AI for multimedia content.

OpenAI's Preparedness initiative outlines a framework for tracking, evaluating, and mitigating catastrophic risks from frontier AI models. It establishes risk thresholds across categories like cybersecurity, CBRN threats, and persuasion, and defines safety standards that must be met before model deployment.

★★★★☆

Epoch AI analyzes the rapidly growing electricity demands of training frontier AI models, examining trends in power consumption, infrastructure constraints, and implications for AI development trajectories. The analysis quantifies how compute scaling translates into energy requirements and identifies key bottlenecks in power availability that may shape the pace of AI progress.

★★★★☆

This paper introduces MATH, a benchmark of 12,500 competition mathematics problems with step-by-step solutions, revealing that large Transformer models achieve surprisingly low accuracy and that scaling alone is insufficient for mathematical reasoning. The authors also release an auxiliary pretraining dataset to aid mathematical learning. The work highlights a fundamental gap between current scaling trends and genuine mathematical reasoning ability.

★★★☆☆
36MIT persuasion studyScience (peer-reviewed)·G. Spitale, N. Biller-Andorno & Federico Germani·2023·Paper

This MIT study examined whether humans can distinguish between accurate and false information in tweets, and whether they can identify AI-generated content from GPT-3 versus human-written tweets. With 697 participants, researchers found that GPT-3 presents a dual challenge: it can produce accurate, easily understandable information but also generates more compelling disinformation. Critically, humans cannot reliably distinguish between GPT-3-generated and human-written tweets, raising significant concerns about AI's potential to spread disinformation during an infodemic.

★★★★★

A Stanford HAI study examines how people respond to messages they believe are generated by AI versus humans, finding that individuals tend to place higher credibility or trust in AI-generated content. This has significant implications for misinformation, persuasion, and the societal risks of AI-generated communication at scale.

★★★★☆

Scale AI introduces SWE-Bench Pro, an enhanced version of the SWE-Bench coding benchmark designed to address limitations in existing evaluations of AI software engineering capabilities. The benchmark aims to provide more reliable and contamination-resistant assessments of AI systems' ability to solve real-world software engineering tasks. This work is relevant to tracking AI capability thresholds in code generation and autonomous software development.

Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its family of AI assistants, with a stated mission of responsible development and maintenance of advanced AI for long-term human benefit.

★★★★☆

Epoch AI's trends page provides data-driven tracking of key metrics in AI development, including compute scaling, model capabilities, and training trends. It serves as a quantitative reference for understanding the trajectory of AI progress across multiple dimensions. The resource aggregates empirical data to help researchers and policymakers assess the pace and direction of AI advancement.

★★★★☆
41Salesforce AI reportssalesforce.com

Salesforce reports on AI adoption trends in customer service, highlighting how businesses are deploying AI tools to automate interactions, improve efficiency, and manage customer relationships. The report provides industry data on AI usage patterns and emerging capabilities in enterprise customer service contexts.

42interdisciplinary review of AI evaluationarXiv·Maria Eriksson et al.·2025·Paper

This interdisciplinary meta-review of ~100 studies examines critical shortcomings in quantitative AI benchmarking practices over the past decade. The paper identifies fine-grained technical issues (dataset biases, data contamination, inadequate documentation) alongside broader sociotechnical problems (overemphasis on text-based single-test evaluation, failure to account for multimodal and interactive AI systems). The authors highlight systemic flaws including misaligned incentives, construct validity issues, and gaming of results, arguing that benchmarking practices are shaped by commercial and competitive dynamics that often prioritize performance metrics over societal concerns. The review challenges the disproportionate trust placed in benchmarks and advocates for improved accountability and real-world relevance in AI evaluation.

★★★☆☆

METR analyzes the safety policies of 12 frontier AI companies to identify common elements, commitments, and gaps in how organizations approach responsible deployment of advanced AI systems. The analysis synthesizes patterns across responsible scaling policies, model cards, and safety frameworks to provide a comparative overview of industry norms. It serves as a reference for understanding where consensus exists and where significant variation or absence of commitments remains.

★★★★☆

RAND Corporation's AI research hub covers policy, national security, and governance implications of artificial intelligence. It aggregates reports, analyses, and commentary on AI risks, military applications, and regulatory frameworks from one of the leading U.S. defense and policy think tanks.

★★★★☆

This Berkeley Center for Long-Term Cybersecurity working paper examines how to define and operationalize 'intolerable risk' thresholds for AI systems, providing a framework for identifying which AI capabilities or behaviors should be categorically prohibited or constrained. It contributes to the growing policy and technical discourse around AI red lines and safety limits.

MedQA is the first free-form multiple-choice open domain question answering dataset for medical problems, sourced from professional medical board exams across three languages: English, simplified Chinese, and traditional Chinese, containing 12,723, 34,251, and 14,123 questions respectively. The authors implement both rule-based and neural methods combining document retrieval and machine comprehension, finding that current best approaches achieve only 36.7%, 42.0%, and 70.1% test accuracy on English, traditional Chinese, and simplified Chinese questions respectively, demonstrating significant challenges for existing OpenQA systems.

★★★☆☆
47FLI AI Safety Index Summer 2025Future of Life Institute

The Future of Life Institute's AI Safety Index Summer 2025 systematically evaluates leading AI companies on safety practices, finding widespread deficiencies across risk management, transparency, and existential safety planning. Anthropic receives the highest grade of C+, indicating that even the best-performing company falls significantly short of adequate safety standards. The report serves as a comparative benchmark for industry accountability.

★★★☆☆
48On the Measure of IntelligencearXiv·François Chollet·2019·Paper

This paper argues that current AI benchmarking practices, which measure skill at specific tasks like games, fail to capture true intelligence because skill can be artificially inflated through prior knowledge and training data. The authors propose a formal definition of intelligence based on Algorithmic Information Theory, conceptualizing it as skill-acquisition efficiency across diverse tasks. They introduce the Abstraction and Reasoning Corpus (ARC), a benchmark designed with human-like priors to enable fair comparisons of general fluid intelligence between AI systems and humans, addressing the need for appropriate feedback signals in developing more intelligent and human-like artificial systems.

★★★☆☆

GSM8K is a benchmark dataset of 8.5K high-quality grade school math word problems designed to evaluate multi-step mathematical reasoning in language models. The paper demonstrates that state-of-the-art transformer models struggle with this conceptually simple task. To address this limitation, the authors propose a verification approach where multiple candidate solutions are generated and ranked by a trained verifier, showing that verification significantly improves performance and scales more effectively than finetuning baselines.

★★★☆☆

Comprehensive analysis of the ARC Prize competition results for 2024-2025, evaluating AI systems' performance on the Abstraction and Reasoning Corpus (ARC) benchmark designed to test general fluid intelligence. The results provide insight into the current state of AI reasoning capabilities and how close frontier models are to human-level performance on novel problem-solving tasks.

SecBench appears to be a GitHub organization focused on security benchmarking, likely providing standardized evaluation frameworks for assessing AI or software security capabilities. It aims to establish measurable thresholds and risk assessment criteria for security-related tasks. The project likely offers tools or datasets for evaluating security-relevant AI capabilities.

★★★☆☆

NVIDIA Omniverse is a platform for building and operating metaverse applications, enabling real-time 3D simulation, collaboration, and digital twin creation. It provides tools for connecting and simulating physically accurate virtual worlds used in robotics, autonomous vehicles, and industrial applications. The platform is increasingly relevant to AI development as a simulation environment for training and testing AI systems.

Related Wiki Pages

Top Related Pages

Risks

Authentication Collapse

Approaches

Capability ElicitationDangerous Capability EvaluationsAI Evaluation

Analysis

AI Safety Intervention Effectiveness MatrixAI Safety Defense in Depth ModelEpistemic Collapse Threshold Model

Concepts

Situational AwarenessAgi DevelopmentAI TimelinesAI Scaling LawsLarge Language ModelsAutonomous Coding

Organizations

Epoch AIFuture of Life Institute

Other

AI EvaluationsPhilip TetlockRed TeamingEli Lifland