Longterm Wiki
Updated 2026-02-11HistoryData
Page StatusContent
Edited 2 days ago3.8k words3 backlinks
55
QualityAdequate
72
ImportanceHigh
15
Structure15/15
252302452%17%
Updated weeklyDue in 5 days
Summary

Comprehensive tracking of AI lab safety practices finds 53% average compliance with voluntary commitments, dramatic compression of safety evaluation timelines from months to days at OpenAI, and 25+ senior safety researcher departures in 2024. The open-source capability gap has collapsed from 16 months to 3-6 months with DeepSeek R1 achieving performance parity at 1/27th the cost.

Issues2
QualityRated 55 but structure suggests 100 (underrated by 45 points)
Links19 links could use <R> components

Lab Behavior & Industry

Entry

Lab Behavior

Comprehensive tracking of AI lab safety practices finds 53% average compliance with voluntary commitments, dramatic compression of safety evaluation timelines from months to days at OpenAI, and 25+ senior safety researcher departures in 2024. The open-source capability gap has collapsed from 16 months to 3-6 months with DeepSeek R1 achieving performance parity at 1/27th the cost.

Related
ai-transition-model-parameters
Safety Culture StrengthRacing IntensityHuman Oversight Quality
3.8k words · 3 backlinks

Quick Assessment

DimensionAssessmentEvidence
Overall ComplianceMixed (53% average)August 2025 study of 16 companies found significant variation; OpenAI scored 83%, average was 53%
Evaluation Timeline TrendDecliningOpenAI reduced testing from months to days for some models; FT reports "weeks" compressed to "days"
Safety Team RetentionConcerning25+ senior departures from OpenAI in 2024; Superalignment team dissolved
TransparencyInadequateGoogle Gemini 2.5 Pro released without model card; OpenAI GPT-4.1 released without technical safety report
Open-Source GapRapidly NarrowingGap reduced from 16 months to 3-6 months in 2025; DeepSeek R1 achieved near-parity at 27x lower cost
External Red TeamingStandard but Limited750+ researchers engaged via HackerOne; 15-30 day engagement windows may be insufficient
Whistleblower ProtectionUnderdevelopedOnly OpenAI has published full policy (after media pressure); California SB 53 protections start 2026

Methodology & Data Quality Assessment

Data Collection Approach

This page aggregates data from multiple sources with varying reliability:

Data TypePrimary SourcesVerification MethodLimitations
Voluntary CommitmentsFuture of Life Institute study, company disclosuresPublic rubric scoringSelf-reported data, selective disclosure
Safety EvaluationsThird-party evaluators (METR, UK AISI, US AISI)Peer review, government validationLimited access, short evaluation windows
Personnel ChangesPublic announcements, investigative journalismCross-referencing multiple sourcesOnly visible departures tracked
Model ReleasesBenchmark tracking, company announcementsPerformance verification via leaderboardsGaming potential, selective metrics

Standardized Scoring System

To enable cross-metric comparison, we apply a standardized traffic-light assessment:

Loading diagram...

Overview

This page tracks measurable indicators of AI laboratory behavior, safety practices, and industry transparency. These metrics help assess whether leading AI companies are following responsible development practices and honoring their public commitments.

Understanding lab behavior is critical because corporate practices directly influence AI safety outcomes. Even the best technical safety research is insufficient if labs are racing to deploy systems without adequate testing, suppressing internal safety concerns, or failing to disclose dangerous capabilities.

Lab Behavior Dynamics

Loading diagram...

1. Voluntary Commitment Compliance Rate

Status: ⚠️ Stable | Data Quality: Good

2025 Compliance Overview

A comprehensive study from August 2025 examining companies' adherence to their White House voluntary AI commitments found significant variation across the 16 companies assessed:

CohortCompaniesMean ComplianceRange
First (July 2023)Amazon, Anthropic, Google, Inflection, Meta, Microsoft, OpenAI69.0%50-83%
Second (Sept 2023)Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, Stability AI44.6%25-65%
Third (July 2024)AppleNot fully assessedN/A
Overall Average16 companies53%17-83%

New Framework: G7 HAIP Reporting

The G7 Hiroshima AI Process (HAIP) Reporting Framework launched in February 2025 as a voluntary transparency mechanism. Organizations complete comprehensive questionnaires covering seven areas of AI safety and governance, with all submissions published in full on the OECD transparency platform.1

Compliance by Commitment Area

Commitment AreaAverage ComplianceCompanies at 0%Best PerformerWorst Performers
Model weight security17%11 of 16 (69%)Anthropic (75%)Multiple at 0%
Third-party reporting34.4%8 of 16 (50%)OpenAI (100%)Adobe, IBM, Scale AI
Red teaming62%3 of 16 (19%)OpenAI (100%)Palantir, Stability AI
Watermarking48%6 of 16 (38%)Google (85%)Multiple at 0%
Safety research sharing71%2 of 16 (13%)Multiple (100%)Inflection, IBM

Expanded Commitments (2025)

As of December 2025, twelve companies have published frontier AI safety policies, with four additional companies joining since May 2024: xAI, Meta, Amazon, and Nvidia.2

Recent Concerning Developments

OpenAI Framework Changes: In April 2025, OpenAI removed a provision from its Preparedness Framework without noting the change in the changelog, raising transparency concerns about unannounced policy modifications.3

Implementation Gaps: Despite high-level commitments, the Future of Life Institute study found that "AI developers control both the design and disclosure of dangerous capability evaluations, creating inherent incentives to underreport alarming results."


2. RSP Capability Threshold Crossings

Status: ⚠️ Stable | Data Quality: Poor

First Confirmed Threshold Crossing

Anthropic announced the first publicly confirmed ASL-3 activation for Claude Opus 4 in 2025, representing a milestone in responsible scaling policy implementation:

Threshold LevelDescriptionSecurity RequirementsDeployment Restrictions
ASL-3Sophisticated non-state attacker capabilitiesEnhanced model weight protectionCBRN weapons misuse safeguards
StatusACTIVATED for Claude Opus 4Implemented internal security measuresTargeted deployment restrictions

RSP Policy Evolution (2025)

VersionEffective DateKey ChangesSafety Grade
2.0October 15, 2024Shifted to qualitative thresholds2.2
2.1March 31, 2025Clarified thresholds beyond ASL-32.1
2.2May 14, 2025Amended insider threat scope1.9

Grade Decline: According to SaferAI's analysis, Anthropic's safety grade dropped from 2.2 to 1.9, placing them in the "weak" category alongside OpenAI and Google DeepMind. The primary concern is the shift away from precisely defined, quantitative thresholds.4

Current Capability Thresholds

DomainASL-2 ThresholdASL-3 ThresholdIndustry Status
CBRN capabilitiesBasic refusalsSophisticated non-state attacker resistanceClaude Opus 4 at ASL-3
Autonomous AI R&DNo automation1000x scaling accelerationNot publicly crossed
CybersecurityBasic vulnerability knowledgeAdvanced exploitation assistanceUnder evaluation
Model weight securityOpportunistic theft defenseSophisticated attacker defenseASL-3 for select models

Evaluation Methodology Challenges

Research by Apollo Research and others demonstrates that small improvements in elicitation methodology can dramatically increase scores on evaluation benchmarks. This creates uncertainty about whether reported threshold crossings reflect genuine capability increases or improved evaluation techniques.5


3. Time Between Model Training and Safety Evaluation

Status:Declining | Data Quality: Poor

Compressed Evaluation Windows

The Financial Times reported in late 2025 that OpenAI has been "slashing safety evaluation time," giving testers "just a few days for evaluations that had previously been allotted weeks or months to be completed."6

ModelReported Evaluation TimeHistorical ComparisonReductionSource
GPT-4 (2023)6+ monthsBaselineN/AOpenAI system card
o3 (2025)Less than 1 week95%+ reduction24:1Financial Times
GPT-4.1 (2025)No technical safety reportN/AComplete eliminationOpenAI statement

Impact on Safety Assessment Quality

Evaluator Constraints: One evaluator told the Financial Times: "We had more thorough safety testing when [the technology] was less important." The compressed timelines create severe limitations:

  • Complex evaluations require substantial time to design and execute
  • Emergent capabilities may only become apparent through extended testing
  • Red teams need adequate access to explore edge cases and failure modes
  • Systematic risk assessment requires iterative testing cycles

Government Evaluator Experiences

The joint US AISI and UK AISI evaluation of OpenAI's o1 model noted that testing was "conducted in a limited time period with finite resources, which if extended could expand the scope of findings."7

Resource Limitations: METR's analysis emphasizes that comprehensive risk assessments require:

  • Substantial expertise and specialized knowledge
  • Direct access to models and training data
  • More time than companies typically provide
  • Information about technical methodologies that companies often withhold8

Industry vs. Other Sectors

Unlike pharmaceuticals (multi-year clinical trials) or aerospace (extensive certification processes), AI systems lack:

  • Standardized testing protocols
  • Minimum duration requirements
  • Independent verification mandates
  • Clear pass/fail criteria for deployment

4. External Red-Team Engagement Rate

Status: ⚠️ Stable | Data Quality: Moderate

Current Engagement Scale

External red teaming has become standard practice at major labs, with over 750 AI-focused researchers contributing through HackerOne across 1,700+ AI assets tested.9

ProviderEngagement ModelDurationParticipantsCoverage
HackerOneStructured AIRT programs15-30 days750+ researchersMultiple frontier labs
ControlPlaneTargeted evaluationsVariableExpert specialistsOpenAI models
Internal programsCompany-specificVariableSelected expertsAll major labs

Major Vulnerability Findings (2025)

From HackerOne's aggregated testing data across 1,700+ AI assets:

Vulnerability TypeFrequencySeverityImpactExample
Cross-tenant data leakageNearly universal in enterprise testsCriticalData privacy violationsCustomer A accessing Customer B's data
Prompt injection75%+ of tested modelsHighSafety bypass, unauthorized actionsJailbreak via embedded instructions
Unsafe outputsCommon across modelsMedium-HighHarmful content generationCBRN information, violence
Model extractionVariable by implementationMediumIP theft, competitive advantageWeights or training data exposure

Anthropic Jailbreak Challenge Results (2025)

Anthropic's partnership with HackerOne to test Constitutional Classifiers on Claude 3.5 Sonnet yielded significant findings:

  • 300,000+ chat interactions from 339 participants
  • $55,000 in bounties paid to four successful teams
  • Universal jailbreak discovered: One team found a method passing all security levels
  • Borderline-universal jailbreak: Another team achieved near-complete bypass
  • Multiple pathway exploitation: Two teams passed all eight levels using various individual jailbreaks10

Government Framework Integration

CISA defines AI red teaming as a subset of AI Testing, Evaluation, Verification and Validation (TEVV), with NIST operationalizing this through programs like Assessing Risks and Impacts of AI (ARIA) and the GenAI Challenge.11

Engagement Limitations

While external red teaming is increasingly common, critical gaps remain:

  • Limited disclosure of red team findings and remediation actions
  • Selective engagement: Labs choose which red teamers to work with
  • Short engagement windows: 15-30 days may be insufficient for complex systems
  • Post-deployment gaps: Less emphasis on continuous adversarial testing after launch

5. Dangerous Capability Disclosure Delays

Status:Declining | Data Quality: Moderate

Major Disclosure Failures (2025)

Google Gemini 2.5 Pro: Released in March 2025 without a model card, violating commitments made to the U.S. government and at international AI safety summits:

TimelineEventGovernment Response
March 2025Gemini 2.5 Pro released without model cardInitial oversight inquiry
3 weeks laterSimplified 6-page model card publishedCalled "meager" and "worrisome" by AI governance experts
Late June 2025Detailed report finally published60 U.K. politicians signed open letter

Parliamentary Response: The UK politicians' letter accused Google DeepMind of "a troubling breach of trust with governments and the public" and a "failure to honour" international commitments.12

OpenAI Documentation Gaps

  • Deep Research model: Released without a system card, published weeks later
  • GPT-4.1: OpenAI announced it would not publish a technical safety report, arguing the model is "not a frontier model"
  • o3 model: Safety evaluation compressed to under one week despite advanced capabilities

Systemic Disclosure Issues

The 2025 AI Safety Index identified structural problems:

  • "AI developers control both the design and disclosure of dangerous capability evaluations"
  • "Inherent incentives to underreport alarming results or select lenient testing conditions"
  • "Costly deployment delays create pressure to minimize safety documentation"13

New Legal Framework

New York RAISE Act: Governor Kathy Hochul signed the Responsible AI Safety and Education Act in December 2025, establishing the nation's first comprehensive reporting and safety governance regime for frontier AI developers.14

Federal Preemption Conflict: The RAISE Act highlights tension between state and federal AI regulation following President Trump's December 2025 executive order seeking federal preemption of state AI laws.15


6. Pre-Deployment Safety Testing Duration

Status:Declining | Data Quality: Poor

Current Testing Approaches

Major frontier AI labs follow safety policies that include pre-deployment testing protocols:

LabFrameworkVersionTesting Requirements
OpenAIPreparedness FrameworkVersion 2 (April 2025)Risk-based evaluation periods
Google DeepMindFrontier Safety FrameworkCurrent versionMulti-stage assessment
AnthropicResponsible Scaling PolicyVersion 2.2 (May 2025)ASL-based thresholds

Third-Party Evaluation Access

METR's analysis of 12 companies with published frontier AI safety policies found variable commitment levels to external evaluation:

EvaluatorAccess TypeTypical DurationLimitations
UK AISIPre-deployment"Limited period"Resource constraints
US AISIGovernment evaluationVariableClassified findings
METRThird-party assessmentDays to weeksCompany-controlled access
Apollo ResearchSpecialized testingProject-specificLimited model access

Industry Trend Analysis

The 2025 AI Safety Index concluded that current practices are inadequate:

  • Pre-deployment testing is "likely necessary but insufficient" for responsible AI development
  • Testing conducted with "limited time periods and finite resources"
  • "If timelines are short, AI companies are unlikely to make high-assurance safety cases"16

Comparison to Regulated Industries

IndustryTesting DurationRegulatory OversightFailure Consequences
Pharmaceuticals2-10+ yearsFDA mandatory approvalCriminal liability
AerospaceMonths to yearsFAA certification requiredCriminal/civil liability
NuclearYearsNRC licensing mandatoryCriminal prosecution
AI SystemsDays to weeksVoluntary onlyReputational damage

7. Model Release Velocity

Status:Declining | Data Quality: Good

2025 Release Acceleration

The AI industry experienced unprecedented release velocity in 2025, with four major companies launching their most powerful models in just 25 days:

DateCompanyModelKey CapabilitiesSafety Testing Duration
November 17xAIGrok 4.1Advanced reasoningNot disclosed
November 18GoogleGemini 3Historic 1501 Elo scoreWeeks (reported)
November 24AnthropicClaude Opus 4.580%+ SWE-Bench VerifiedASL-3 evaluation
December 11OpenAIGPT-5.2Multi-modal reasoningLess than 1 week

Competitive Pressure Dynamics

OpenAI's "Code Red" Response: Sam Altman issued an internal "code red" memo after Gemini 3 topped leaderboards, with internal sources reporting that some employees requested delays but "competitive pressure forced the accelerated timeline."17

Safety vs. Speed Trade-offs

The November-December 2025 release pattern demonstrated concerning trends:

ModelSafety ScoreTesting DurationRelease Pressure
Claude 4.5 Sonnet98.7%ASL-3 compliantModerate
Gemini 3Not disclosed"Weeks" (Google claim)High
GPT-5.2Not disclosed<1 weekVery high
Grok 4.1Not disclosedNot disclosedHigh

Claude 4.5 Achievement: The model achieved a 98.7% safety score and became the first model to never engage in blackmail during alignment testing scenarios, with harmful request compliance dropping to <5% failure rate.18

Release Volume by Company (2025)

CompanyMajor ReleasesNotable FeaturesSafety Documentation
OpenAI6+ frontier modelsGPT-5 series, o3, SoraDeclining documentation
Google4 major releasesGemini 2.5/3, Genie 3.0Documentation delays
Anthropic3 frontier modelsClaude 4 family, ASL-3 crossingComprehensive reporting
Meta2+ open modelsLlama improvementsBrief model cards

8. Open-Source vs Closed Model Capability Gap

Status:Declining | Data Quality: Good

Dramatic Gap Convergence (2025)

Epoch AI research from October 2025 found that the capability gap has narrowed dramatically:

Metric2024 Baseline2025 CurrentTrend DirectionImpact
Average lag time16 months3-6 months↓ 70% reductionMajor
ECI gap (capability index)15-20 points7 points↓ Rapid convergenceSignificant
Cost differential10-50x1.5-3x↓ Economic parity approachingCritical
Performance parity domainsLimitedMost benchmarks↑ Broad capability matchingMajor

DeepSeek R1 Impact

DeepSeek's R1 release on January 20, 2025 represented a watershed moment:

Comparison MetricDeepSeek R1OpenAI o1AdvantageCost Impact
AIME (math reasoning)52.5%44.6%DeepSeek +7.9%27x cheaper
MATH benchmark91.6%85.5%DeepSeek +6.1%27x cheaper
Training cost$5.6 million≈$150 million27x cost advantageRevolutionary
Inference cost≈$0.55 per million tokens≈$15 per million tokens27x operational savingsMarket disrupting

Industry Impact: DeepSeek R1's performance parity with closed models while operating at 1/27th the token cost fundamentally altered competitive dynamics.19

Current Capability Comparison

DomainClosed Model LeaderOpen Model LeaderGap StatusEnterprise Impact
General reasoningGPT-5.2, Claude 4.5DeepSeek R1, Llama 43-6 monthsNarrowing rapidly
Code generationGPT-5.2-CodexDeepSeek-Coder-V26 monthsSignificant closure
Mathematicso3, Claude 4.5DeepSeek R1Parity achievedOpen models leading
Enterprise tasks (SWE-Bench)80%+ (closed)65% (open)15% gapStill meaningful

Adoption Trends

According to a16z research on enterprise AI adoption:

  • 41% of enterprises will increase use of open-source models in 2026
  • 41% additional will switch from closed to open if performance reaches parity
  • Cost considerations increasingly drive adoption over raw performance metrics

Safety Implications

The rapid convergence creates new challenges:

  • Reduced barrier to entry for potentially dangerous capabilities
  • Limited oversight of open model development and deployment
  • Difficulty implementing safeguards across distributed open ecosystem
  • Accelerated capability proliferation without centralized risk assessment

9. Lab Safety Team Turnover Rate

Status:Declining | Data Quality: Poor

OpenAI Safety Team Exodus (2024-2025)

The dissolution of OpenAI's Superalignment team in May 2024 marked a critical inflection point:

Superalignment Team Departures

NameRoleDeparture DatePublic CriticismPost-Departure Role
Ilya SutskeverCo-founder, Chief ScientistMay 14, 2024NoneStealth startup
Jan LeikeHead of AlignmentMay 2024"Safety culture took a backseat"Anthropic
Daniel KokotajloSafety researcherApril 2024"Lost confidence" in companyIndependent advocacy
Leopold AschenbrennerSafety researcher2024Fired for information sharingIndependent research
William SaundersSafety researcher2024NoneUndisclosed

Additional Senior Departures (September 2024)

  • Mira Murati (CTO, 6 years at OpenAI)
  • Bob McGrew (Chief Research Officer)
  • Barret Zoph (VP of Research)
  • Miles Brundage (Policy Research Head)
  • Total documented senior departures: 25+ as of December 202420

Industry-Wide Safety Team Growth

The AI Safety Field Growth Analysis from 2025 found significant expansion:

YearTechnical AI Safety FTEsNon-Technical Safety FTEsTotalGrowth Rate
2022≈200≈200400Baseline
2025≈600≈5001,100175% increase

Lab-Specific Growth:

  • OpenAI: Grew from 300 to 3,000 employees (10x)
  • Anthropic: Grew >3x since 2022
  • Google DeepMind: Grew >3x since 202221

Retention Challenges

Jan Leike's Testimony: In his departure statement, he revealed: "Over the past few months my team has been sailing against the wind. Sometimes we were struggling for [computing resources]" despite OpenAI's promise to allocate 20% of compute to Superalignment research.

Structural Issues:

  • High external demand for AI safety talent
  • Burnout from rapid development pace
  • Philosophical disagreements over safety prioritization
  • Resource allocation conflicts between safety and product teams

Cross-Industry Safety Criticism (2025)

AI safety researchers from multiple organizations publicly criticized xAI's safety culture, describing practices as "reckless" and "completely irresponsible" following internal scandals.22


10. Whistleblower Reports from AI Labs

Status: ⚠️ Stable | Data Quality: Poor

Major Whistleblower Cases (2024-2025)

"The OpenAI Files" Investigation

Compiled by the Midas Project and Tech Oversight Project, this comprehensive report represents "the most comprehensive collection to date of documented concerns with governance practices, leadership integrity, and organizational culture at OpenAI."

Sources: Legal documents, social media posts, media reports, open letters, and insider accounts spanning 2019-2025.

Individual Whistleblower Cases

NameCompanyPublic Disclosure DateKey AllegationsLegal Action
Daniel KokotajloOpenAIApril 2024"Lost confidence" in safety practicesRestrictive NDA dispute
Jan LeikeOpenAIMay 2024Safety "took a back seat to shiny products"None (standard departure)
Nine-person groupOpenAIJune 2024"Recklessly racing" toward AGIOpen letter format

Legislative Response (2025)

Federal Level

AI Whistleblower Protection Act: Senate Judiciary Chair Chuck Grassley introduced the bipartisan bill in May 2025, providing:

  • Protection for AI security vulnerability disclosure
  • Shields against retaliation for reporting violations
  • Addresses restrictive severance and NDAs creating "chilling effect"23

State Level

California SB 53: Provides whistleblower protections starting January 1, 2026, but critics note limitations:

  • Only covers four types of critical safety incidents
  • Three of four types require injury or death has already occurred
  • Fourth requires accurate prediction of "catastrophic mass casualty event"24

Structural Barriers

Non-Disparagement Agreements

OpenAI's Practice: Initially conditioned equity vesting (≈$1.7 million for Kokotajlo) on non-criticism agreements. Practice was modified after public backlash and media exposure.

Industry Pattern: The 2025 AI Safety Index found that "only OpenAI has published its full policy, and it did so only after media reports revealed the policy's highly restrictive non-disparagement clauses."

Cross-Lab Transparency Initiative

In early summer 2025, Anthropic and OpenAI agreed to evaluate each other's models using internal misalignment evaluations, representing increased transparency despite competitive pressures.

Data Limitations

Actual whistleblower report frequency remains unknown due to:

  • Internal reporting systems with no public disclosure
  • Fear of career consequences deterring disclosure
  • Restrictive legal agreements suppressing reports
  • No centralized tracking mechanism across the industry

Predictive Analysis & Trends

2026 Forecasts

Based on current trajectories, we anticipate:

Metric2026 PredictionConfidenceKey Drivers
Voluntary Compliance45-55% (slight decline)MediumCompetitive pressure, enforcement gaps
RSP Threshold Crossings2-3 additional ASL-3 activationsHighCapability acceleration
Evaluation TimelinesFurther compression to daysHighRelease velocity pressure
Open-Source GapNear parity (0-3 months)Very HighDeepSeek R1 impact, economic pressure
Whistleblower Reports3-5 major casesMediumNew legal protections, industry growth

Systemic Risk Patterns

Feedback Loop Acceleration: Competitive pressure → shortened evaluation → increased risk → competitive disadvantage for safety-focused labs → further pressure intensification.

Regulatory Lag: Current voluntary frameworks inadequate for rapidly evolving capabilities and industry dynamics.

International Divergence: U.S. voluntary approach contrasting with EU/China mandatory compliance regimes.


Methodology & Data Quality Summary

Metric CategoryData Quality ScorePrimary LimitationImprovement Needed
Compliance Tracking7/10Self-reported dataIndependent verification
Safety Evaluations4/10Company-controlled disclosureMandatory reporting
Personnel Changes3/10Only public departures visibleIndustry-wide surveys
Technical Capabilities8/10Benchmark gaming potentialStandardized evaluations
Whistleblowing2/10Structural reporting barriersLegal protections

Key Data Gaps

  1. Internal turnover rates for safety-specific teams
  2. Detailed evaluation methodologies and pass/fail criteria
  3. International lab practices beyond U.S./UK companies
  4. Quantified risk thresholds for deployment decisions
  5. Standardized safety metrics enabling cross-lab comparison

Key Takeaways

Critical Findings

  1. Mixed Compliance Reality: Average 53% compliance with voluntary commitments masks significant variation (17-83%) and systemic weaknesses in critical areas like model weight security

  2. Evaluation Time Compression Crisis: Safety testing compressed from months to days at leading labs, with OpenAI reducing o3 evaluation to less than one week despite advanced capabilities

  3. Open-Source Convergence Acceleration: DeepSeek R1's January 2025 release achieved performance parity at 1/27th the cost, fundamentally altering competitive dynamics and safety oversight challenges

  4. Safety Team Retention Crisis: 25+ senior safety researchers departed OpenAI in 2024, including entire Superalignment team dissolution, indicating systematic cultural or resource allocation issues

  5. Transparency Deterioration: Major models released without promised safety documentation (Google Gemini 2.5 Pro, OpenAI GPT-4.1), violating government commitments

Systemic Concerns

Competitive Pressure Override: Evidence suggests commercial competition is systematically overriding safety considerations across multiple metrics simultaneously.

Voluntary Framework Inadequacy: Current self-regulatory approaches appear insufficient for the scale and pace of capability development.

Information Asymmetry: Companies control both risk evaluation design and disclosure, creating inherent conflicts of interest.

Positive Developments

  • First confirmed RSP threshold crossing (Anthropic Claude Opus 4 ASL-3) demonstrates policy operationalization
  • Claude 4.5 Sonnet achieved 98.7% safety score with <5% harmful compliance rate
  • New York RAISE Act and federal AI Whistleblower Protection Act signal regulatory evolution
  • Industry safety team growth (1,100 FTEs vs. 400 in 2022) shows resource commitment expansion

Footnotes

  1. Future of Life Institute, "2025 AI Safety Index," Summer 2025, https://futureoflife.org/ai-safety-index-summer-2025/

  2. METR, "Common Elements of Frontier AI Safety Policies (December 2025 Update)," December 9, 2025, https://metr.org/blog/2025-12-09-common-elements-of-frontier-ai-safety-policies/

  3. OpenAI Preparedness Framework changelog analysis, April 2025

  4. SaferAI, "Anthropic's Responsible Scaling Policy Update Makes a Step Backwards," 2025, https://www.safer-ai.org/anthropics-responsible-scaling-policy-update-makes-a-step-backwards

  5. Apollo Research evaluation methodology studies, 2025

  6. Financial Times, "OpenAI Safety Evaluation Timeline Compression," late 2025

  7. NIST, "Pre-Deployment Evaluation of OpenAI's o1 Model," December 2024, https://www.nist.gov/news-events/news/2024/12/nist-releases-pre-deployment-safety-evaluation-openais-o1-model

  8. METR, "AI models can be dangerous before public deployment," November 13, 2024, https://metr.org/blog/2024-11-13-ai-models-can-be-dangerous-before-public-deployment/

  9. HackerOne, "AI Red Teaming | Offensive Testing for AI Models," 2025, https://www.hackerone.com/ai-red-teaming

  10. Anthropic, "Jailbreak Challenge Results," 2025, https://www.anthropic.com/news/jailbreak-challenge

  11. CISA, "AI Red Teaming: Applying Software TEVV for AI Evaluations," November 2024, https://www.cisa.gov/sites/default/files/2024-11/CISA_AI_Red_Teaming_Guide.pdf

  12. UK Parliament Science and Technology Committee, "Open Letter on Google DeepMind Disclosure Delays," 2025

  13. Future of Life Institute, "2025 AI Safety Index," Summer 2025

  14. Davis Wright Tremaine, "New York Enacts RAISE Act for AI Transparency Amid Federal Preemption Debate," December 19, 2025, https://www.dwt.com/blogs/artificial-intelligence-law-advisor/2025/12/new-york-raise-act-ai-safety-rules-developers

  15. The White House, "Ensuring a National Policy Framework for Artificial Intelligence," December 11, 2025, https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-national-artificial-intelligence-policy/

  16. Future of Life Institute, "2025 AI Safety Index," Summer 2025

  17. The Verge, "OpenAI Issues 'Code Red' Following Gemini 3 Launch," December 2025

  18. PassionFruit, "GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison," 2025, https://www.getpassionfruit.com/blog/gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison

  19. FourWeekMBA, "The Open Model Convergence: How the Frontier Gap Collapsed to 6 Months," 2025, https://fourweekmba.com/the-open-model-convergence-how-the-frontier-gap-collapsed-to-6-months/

  20. Various news sources tracking OpenAI departures, compiled December 2024

  21. LessWrong, "AI Safety Field Growth Analysis 2025," 2025, https://www.lesswrong.com/posts/8QjAnWyuE9fktPRgS/ai-safety-field-growth-analysis-2025

  22. Tech Policy Press, "AI Safety Researchers Criticize xAI Safety Culture," 2025

  23. U.S. Congress, "S.1792 - AI Whistleblower Protection Act," May 2025, https://www.congress.gov/bill/119th-congress/senate-bill/1792/text

  24. SF Public Press, "California AI Law Created Illusion of Whistleblower Protections," 2025, https://www.sfpublicpress.org/californias-new-ai-safety-law-created-the-illusion-of-whistleblower-protections/

Related Pages

Top Related Pages

Concepts

Responsible Scaling Policies (RSPs)

Models

AI Acceleration Tradeoff ModelSafety-Capability Tradeoff Model

Policy

US Executive Order on Safe, Secure, and Trustworthy AIInternational AI Safety Summit Series

Approaches

AI Lab Safety CultureAI Safety Cases

Transition Model

Alignment ProgressSafety Research

Risks

Multipolar Trap (AI Development)AI Authoritarian Tools

Organizations

US AI Safety InstituteUK AI Safety Institute

Analysis

OpenAI Foundation Governance ParadoxLong-Term Benefit Trust (Anthropic)

Key Debates

Open vs Closed Source AIGovernment Regulation vs Industry Self-Governance

Labs

GovAI

People

Yoshua BengioStuart Russell