Comprehensive tracking of AI lab safety practices finds 53% average compliance with voluntary commitments, dramatic compression of safety evaluation timelines from months to days at OpenAI, and 25+ senior safety researcher departures in 2024. The open-source capability gap has collapsed from 16 months to 3-6 months with DeepSeek R1 achieving performance parity at 1/27th the cost.
Lab Behavior & Industry
Lab Behavior
Comprehensive tracking of AI lab safety practices finds 53% average compliance with voluntary commitments, dramatic compression of safety evaluation timelines from months to days at OpenAI, and 25+ senior safety researcher departures in 2024. The open-source capability gap has collapsed from 16 months to 3-6 months with DeepSeek R1 achieving performance parity at 1/27th the cost.
Quick Assessment
| Dimension | Assessment | Evidence |
|---|---|---|
| Overall Compliance | Mixed (53% average) | August 2025 study of 16 companies found significant variation; OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to commercial AGI developer, with detailed analysis of governance crisis, safety researcher exodus (75% of ... scored 83%, average was 53% |
| Evaluation Timeline Trend | Declining | OpenAI reduced testing from months to days for some models; FT reports "weeks" compressed to "days" |
| Safety Team Retention | Concerning | 25+ senior departures from OpenAI in 2024; Superalignment team dissolved |
| Transparency | Inadequate | Google Gemini 2.5 Pro released without model card; OpenAI GPT-4.1 released without technical safety report |
| Open-Source Gap | Rapidly Narrowing | Gap reduced from 16 months to 3-6 months in 2025; DeepSeek R1 achieved near-parity at 27x lower cost |
| External Red TeamingApproachRed TeamingRed teaming is a systematic adversarial evaluation methodology for identifying AI vulnerabilities and dangerous capabilities before deployment, with effectiveness rates varying from 10-80% dependin...Quality: 65/100 | Standard but Limited | 750+ researchers engaged via HackerOne; 15-30 day engagement windows may be insufficient |
| Whistleblower Protection | Underdeveloped | Only OpenAI has published full policy (after media pressure); California SB 53PolicyCalifornia SB 53California SB 53 represents the first U.S. state law specifically targeting frontier AI safety through transparency requirements, incident reporting, and whistleblower protections, though it makes ...Quality: 73/100 protections start 2026 |
Methodology & Data Quality Assessment
Data Collection Approach
This page aggregates data from multiple sources with varying reliability:
| Data Type | Primary Sources | Verification Method | Limitations |
|---|---|---|---|
| Voluntary Commitments | Future of Life Institute study, company disclosures | Public rubric scoring | Self-reported data, selective disclosure |
| Safety Evaluations | Third-party evaluators (METROrganizationMETRMETR conducts pre-deployment dangerous capability evaluations for frontier AI labs (OpenAI, Anthropic, Google DeepMind), testing autonomous replication, cybersecurity, CBRN, and manipulation capabi...Quality: 66/100, UK AISIOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100, US AISIOrganizationUS AI Safety InstituteThe US AI Safety Institute (AISI), established November 2023 within NIST with $10M budget (FY2025 request $82.7M), conducted pre-deployment evaluations of frontier models through MOUs with OpenAI a...Quality: 91/100) | Peer review, government validation | Limited access, short evaluation windows |
| Personnel Changes | Public announcements, investigative journalism | Cross-referencing multiple sources | Only visible departures tracked |
| Model Releases | Benchmark tracking, company announcements | Performance verification via leaderboards | Gaming potential, selective metrics |
Standardized Scoring System
To enable cross-metric comparison, we apply a standardized traffic-light assessment:
Overview
This page tracks measurable indicators of AI laboratory behavior, safety practices, and industry transparency. These metrics help assess whether leading AI companies are following responsible development practices and honoring their public commitments.
Understanding lab behavior is critical because corporate practices directly influence AI safety outcomes. Even the best technical safety research is insufficient if labs are racing to deploy systems without adequate testing, suppressing internal safety concerns, or failing to disclose dangerous capabilities.
Lab Behavior Dynamics
1. Voluntary Commitment Compliance Rate
Status: ⚠️ Stable | Data Quality: Good
2025 Compliance Overview
A comprehensive study from August 2025 examining companies' adherence to their White House voluntary AI commitments found significant variation across the 16 companies assessed:
| Cohort | Companies | Mean Compliance | Range |
|---|---|---|---|
| First (July 2023) | Amazon, AnthropicOrganizationAnthropicComprehensive profile of Anthropic, founded in 2021 by seven former OpenAI researchers (Dario and Daniela Amodei, Chris Olah, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish) with early funding..., Google, Inflection, MetaOrganizationMeta AI (FAIR)Comprehensive organizational profile of Meta AI covering $66-72B infrastructure investment (2025), LLaMA model family (1B+ downloads), and transition from FAIR research lab to product-focused GenAI...Quality: 51/100, Microsoft, OpenAI | 69.0% | 50-83% |
| Second (Sept 2023) | Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, Stability AI | 44.6% | 25-65% |
| Third (July 2024) | Apple | Not fully assessed | N/A |
| Overall Average | 16 companies | 53% | 17-83% |
New Framework: G7 HAIP Reporting
The G7 Hiroshima AI Process (HAIP) Reporting Framework launched in February 2025 as a voluntary transparency mechanism. Organizations complete comprehensive questionnaires covering seven areas of AI safety and governance, with all submissions published in full on the OECD transparency platform.1
Compliance by Commitment Area
| Commitment Area | Average Compliance | Companies at 0% | Best Performer | Worst Performers |
|---|---|---|---|---|
| Model weight security | 17% | 11 of 16 (69%) | Anthropic (75%) | Multiple at 0% |
| Third-party reporting | 34.4% | 8 of 16 (50%) | OpenAI (100%) | Adobe, IBM, Scale AI |
| Red teaming | 62% | 3 of 16 (19%) | OpenAI (100%) | Palantir, Stability AI |
| Watermarking | 48% | 6 of 16 (38%) | Google (85%) | Multiple at 0% |
| Safety research sharing | 71% | 2 of 16 (13%) | Multiple (100%) | Inflection, IBM |
Expanded Commitments (2025)
As of December 2025, twelve companies have published frontier AI safety policies, with four additional companies joining since May 2024: xAIOrganizationxAIComprehensive profile of xAI covering its founding by Elon Musk in 2023, rapid growth to $230B valuation and $3.8B revenue, development of Grok models, and controversial 'truth-seeking' safety appr...Quality: 48/100, Meta, Amazon, and Nvidia.2
Recent Concerning Developments
OpenAI Framework Changes: In April 2025, OpenAI removed a provision from its Preparedness Framework without noting the change in the changelog, raising transparency concerns about unannounced policy modifications.3
Implementation Gaps: Despite high-level commitments, the Future of Life Institute study found that "AI developers control both the design and disclosure of dangerous capability evaluations, creating inherent incentives to underreport alarming results."
2. RSP Capability Threshold Crossings
Status: ⚠️ Stable | Data Quality: Poor
First Confirmed Threshold Crossing
AnthropicOrganizationAnthropicComprehensive profile of Anthropic, founded in 2021 by seven former OpenAI researchers (Dario and Daniela Amodei, Chris Olah, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish) with early funding... announced the first publicly confirmed ASL-3 activation for Claude Opus 4 in 2025, representing a milestone in responsible scaling policyPolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 implementation:
| Threshold Level | Description | Security Requirements | Deployment Restrictions |
|---|---|---|---|
| ASL-3 | Sophisticated non-state attacker capabilities | Enhanced model weight protection | CBRN weapons misuse safeguards |
| Status | ACTIVATED for Claude Opus 4 | Implemented internal security measures | Targeted deployment restrictions |
RSP Policy Evolution (2025)
| Version | Effective Date | Key Changes | Safety Grade |
|---|---|---|---|
| 2.0 | October 15, 2024 | Shifted to qualitative thresholds | 2.2 |
| 2.1 | March 31, 2025 | Clarified thresholds beyond ASL-3 | 2.1 |
| 2.2 | May 14, 2025 | Amended insider threat scope | 1.9 |
Grade Decline: According to SaferAI's analysis, Anthropic's safety grade dropped from 2.2 to 1.9, placing them in the "weak" category alongside OpenAI and Google DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100. The primary concern is the shift away from precisely defined, quantitative thresholds.4
Current Capability Thresholds
| Domain | ASL-2 Threshold | ASL-3 Threshold | Industry Status |
|---|---|---|---|
| CBRN capabilities | Basic refusals | Sophisticated non-state attacker resistance | Claude Opus 4 at ASL-3 |
| Autonomous AI R&D | No automation | 1000x scaling acceleration | Not publicly crossed |
| Cybersecurity | Basic vulnerability knowledge | Advanced exploitation assistance | Under evaluation |
| Model weight security | Opportunistic theft defense | Sophisticated attacker defense | ASL-3 for select models |
Evaluation Methodology Challenges
Research by Apollo ResearchOrganizationApollo ResearchApollo Research demonstrated in December 2024 that all six tested frontier models (including o1, Claude 3.5 Sonnet, Gemini 1.5 Pro) engage in scheming behaviors, with o1 maintaining deception in ov...Quality: 58/100 and others demonstrates that small improvements in elicitation methodology can dramatically increase scores on evaluation benchmarks. This creates uncertainty about whether reported threshold crossings reflect genuine capability increases or improved evaluation techniques.5
3. Time Between Model Training and Safety Evaluation
Status: ❌ Declining | Data Quality: Poor
Compressed Evaluation Windows
The Financial Times reported in late 2025 that OpenAI has been "slashing safety evaluation time," giving testers "just a few days for evaluations that had previously been allotted weeks or months to be completed."6
| Model | Reported Evaluation Time | Historical Comparison | Reduction | Source |
|---|---|---|---|---|
| GPT-4 (2023) | 6+ months | Baseline | N/A | OpenAI system card |
| o3 (2025) | Less than 1 week | 95%+ reduction | 24:1 | Financial Times |
| GPT-4.1 (2025) | No technical safety report | N/A | Complete elimination | OpenAI statement |
Impact on Safety Assessment Quality
Evaluator Constraints: One evaluator told the Financial Times: "We had more thorough safety testing when [the technology] was less important." The compressed timelines create severe limitations:
- Complex evaluations require substantial time to design and execute
- Emergent capabilities may only become apparent through extended testing
- Red teams need adequate access to explore edge cases and failure modes
- Systematic risk assessment requires iterative testing cycles
Government Evaluator Experiences
The joint US AISI and UK AISI evaluation of OpenAI's o1 model noted that testing was "conducted in a limited time period with finite resources, which if extended could expand the scope of findings."7
Resource Limitations: METR's analysis emphasizes that comprehensive risk assessments require:
- Substantial expertise and specialized knowledge
- Direct access to models and training data
- More time than companies typically provide
- Information about technical methodologies that companies often withhold8
Industry vs. Other Sectors
Unlike pharmaceuticals (multi-year clinical trials) or aerospace (extensive certification processes), AI systems lack:
- Standardized testing protocols
- Minimum duration requirements
- Independent verification mandates
- Clear pass/fail criteria for deployment
4. External Red-Team Engagement Rate
Status: ⚠️ Stable | Data Quality: Moderate
Current Engagement Scale
External red teaming has become standard practice at major labs, with over 750 AI-focused researchers contributing through HackerOne across 1,700+ AI assets tested.9
| Provider | Engagement Model | Duration | Participants | Coverage |
|---|---|---|---|---|
| HackerOne | Structured AIRT programs | 15-30 days | 750+ researchers | Multiple frontier labs |
| ControlPlane | Targeted evaluations | Variable | Expert specialists | OpenAI models |
| Internal programs | Company-specific | Variable | Selected experts | All major labs |
Major Vulnerability Findings (2025)
From HackerOne's aggregated testing data across 1,700+ AI assets:
| Vulnerability Type | Frequency | Severity | Impact | Example |
|---|---|---|---|---|
| Cross-tenant data leakage | Nearly universal in enterprise tests | Critical | Data privacy violations | Customer A accessing Customer B's data |
| Prompt injection | 75%+ of tested models | High | Safety bypass, unauthorized actions | Jailbreak via embedded instructions |
| Unsafe outputs | Common across models | Medium-High | Harmful content generation | CBRN information, violence |
| Model extraction | Variable by implementation | Medium | IP theft, competitive advantage | Weights or training data exposure |
Anthropic Jailbreak Challenge Results (2025)
AnthropicOrganizationAnthropicComprehensive profile of Anthropic, founded in 2021 by seven former OpenAI researchers (Dario and Daniela Amodei, Chris Olah, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish) with early funding...'s partnership with HackerOne to test Constitutional Classifiers on Claude 3.5 Sonnet yielded significant findings:
- 300,000+ chat interactions from 339 participants
- $55,000 in bounties paid to four successful teams
- Universal jailbreak discovered: One team found a method passing all security levels
- Borderline-universal jailbreak: Another team achieved near-complete bypass
- Multiple pathway exploitation: Two teams passed all eight levels using various individual jailbreaks10
Government Framework Integration
CISA defines AI red teaming as a subset of AI Testing, Evaluation, Verification and Validation (TEVV), with NIST operationalizing this through programs like Assessing Risks and Impacts of AI (ARIA) and the GenAI Challenge.11
Engagement Limitations
While external red teaming is increasingly common, critical gaps remain:
- Limited disclosure of red team findings and remediation actions
- Selective engagement: Labs choose which red teamers to work with
- Short engagement windows: 15-30 days may be insufficient for complex systems
- Post-deployment gaps: Less emphasis on continuous adversarial testing after launch
5. Dangerous Capability Disclosure Delays
Status: ❌ Declining | Data Quality: Moderate
Major Disclosure Failures (2025)
Google Gemini 2.5 Pro: Released in March 2025 without a model card, violating commitments made to the U.S. government and at international AI safety summits:
| Timeline | Event | Government Response |
|---|---|---|
| March 2025 | Gemini 2.5 Pro released without model card | Initial oversight inquiry |
| 3 weeks later | Simplified 6-page model card published | Called "meager" and "worrisome" by AI governance experts |
| Late June 2025 | Detailed report finally published | 60 U.K. politicians signed open letter |
Parliamentary Response: The UK politicians' letter accused Google DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100 of "a troubling breach of trust with governments and the public" and a "failure to honour" international commitments.12
OpenAI Documentation Gaps
- Deep Research model: Released without a system card, published weeks later
- GPT-4.1: OpenAI announced it would not publish a technical safety report, arguing the model is "not a frontier model"
- o3 model: Safety evaluation compressed to under one week despite advanced capabilities
Systemic Disclosure Issues
The 2025 AI Safety Index identified structural problems:
- "AI developers control both the design and disclosure of dangerous capability evaluations"
- "Inherent incentives to underreport alarming results or select lenient testing conditions"
- "Costly deployment delays create pressure to minimize safety documentation"13
New Legal Framework
New York RAISE Act: Governor Kathy Hochul signed the Responsible AI Safety and Education Act in December 2025, establishing the nation's first comprehensive reporting and safety governance regime for frontier AI developers.14
Federal Preemption Conflict: The RAISE Act highlights tension between state and federal AI regulation following President Trump's December 2025 executive order seeking federal preemption of state AI laws.15
6. Pre-Deployment Safety Testing Duration
Status: ❌ Declining | Data Quality: Poor
Current Testing Approaches
Major frontier AI labs follow safety policies that include pre-deployment testing protocols:
| Lab | Framework | Version | Testing Requirements |
|---|---|---|---|
| OpenAI | Preparedness Framework | Version 2 (April 2025) | Risk-based evaluation periods |
| Google DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100 | Frontier Safety Framework | Current version | Multi-stage assessment |
| AnthropicOrganizationAnthropicComprehensive profile of Anthropic, founded in 2021 by seven former OpenAI researchers (Dario and Daniela Amodei, Chris Olah, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish) with early funding... | Responsible Scaling PolicyPolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 | Version 2.2 (May 2025) | ASL-based thresholds |
Third-Party Evaluation Access
METR's analysis of 12 companies with published frontier AI safety policies found variable commitment levels to external evaluation:
| Evaluator | Access Type | Typical Duration | Limitations |
|---|---|---|---|
| UK AISIOrganizationUK AI Safety InstituteThe UK AI Safety Institute (renamed AI Security Institute in Feb 2025) operates with ~30 technical staff and 50M GBP annual budget, conducting frontier model evaluations using its open-source Inspe...Quality: 52/100 | Pre-deployment | "Limited period" | Resource constraints |
| US AISIOrganizationUS AI Safety InstituteThe US AI Safety Institute (AISI), established November 2023 within NIST with $10M budget (FY2025 request $82.7M), conducted pre-deployment evaluations of frontier models through MOUs with OpenAI a...Quality: 91/100 | Government evaluation | Variable | Classified findings |
| METROrganizationMETRMETR conducts pre-deployment dangerous capability evaluations for frontier AI labs (OpenAI, Anthropic, Google DeepMind), testing autonomous replication, cybersecurity, CBRN, and manipulation capabi...Quality: 66/100 | Third-party assessment | Days to weeks | Company-controlled access |
| Apollo ResearchOrganizationApollo ResearchApollo Research demonstrated in December 2024 that all six tested frontier models (including o1, Claude 3.5 Sonnet, Gemini 1.5 Pro) engage in scheming behaviors, with o1 maintaining deception in ov...Quality: 58/100 | Specialized testing | Project-specific | Limited model access |
Industry Trend Analysis
The 2025 AI Safety Index concluded that current practices are inadequate:
- Pre-deployment testing is "likely necessary but insufficient" for responsible AI development
- Testing conducted with "limited time periods and finite resources"
- "If timelines are short, AI companies are unlikely to make high-assurance safety cases"16
Comparison to Regulated Industries
| Industry | Testing Duration | Regulatory Oversight | Failure Consequences |
|---|---|---|---|
| Pharmaceuticals | 2-10+ years | FDA mandatory approval | Criminal liability |
| Aerospace | Months to years | FAA certification required | Criminal/civil liability |
| Nuclear | Years | NRC licensing mandatory | Criminal prosecution |
| AI Systems | Days to weeks | Voluntary only | Reputational damage |
7. Model Release Velocity
Status: ❌ Declining | Data Quality: Good
2025 Release Acceleration
The AI industry experienced unprecedented release velocity in 2025, with four major companies launching their most powerful models in just 25 days:
| Date | Company | Model | Key Capabilities | Safety Testing Duration |
|---|---|---|---|---|
| November 17 | xAIOrganizationxAIComprehensive profile of xAI covering its founding by Elon Musk in 2023, rapid growth to $230B valuation and $3.8B revenue, development of Grok models, and controversial 'truth-seeking' safety appr...Quality: 48/100 | Grok 4.1 | Advanced reasoning | Not disclosed |
| November 18 | Gemini 3 | Historic 1501 Elo score | Weeks (reported) | |
| November 24 | Anthropic | Claude Opus 4.5 | 80%+ SWE-Bench Verified | ASL-3 evaluation |
| December 11 | OpenAI | GPT-5.2 | Multi-modal reasoning | Less than 1 week |
Competitive Pressure Dynamics
OpenAI's "Code Red" Response: Sam Altman issued an internal "code red" memo after Gemini 3 topped leaderboards, with internal sources reporting that some employees requested delays but "competitive pressure forced the accelerated timeline."17
Safety vs. Speed Trade-offs
The November-December 2025 release pattern demonstrated concerning trends:
| Model | Safety Score | Testing Duration | Release Pressure |
|---|---|---|---|
| Claude 4.5 Sonnet | 98.7% | ASL-3 compliant | Moderate |
| Gemini 3 | Not disclosed | "Weeks" (Google claim) | High |
| GPT-5.2 | Not disclosed | <1 week | Very high |
| Grok 4.1 | Not disclosed | Not disclosed | High |
Claude 4.5 Achievement: The model achieved a 98.7% safety score and became the first model to never engage in blackmail during alignment testing scenarios, with harmful request compliance dropping to <5% failure rate.18
Release Volume by Company (2025)
| Company | Major Releases | Notable Features | Safety Documentation |
|---|---|---|---|
| OpenAI | 6+ frontier models | GPT-5 series, o3, Sora | Declining documentation |
| 4 major releases | Gemini 2.5/3, Genie 3.0 | Documentation delays | |
| Anthropic | 3 frontier models | Claude 4 family, ASL-3 crossing | Comprehensive reporting |
| MetaOrganizationMeta AI (FAIR)Comprehensive organizational profile of Meta AI covering $66-72B infrastructure investment (2025), LLaMA model family (1B+ downloads), and transition from FAIR research lab to product-focused GenAI...Quality: 51/100 | 2+ open models | Llama improvements | Brief model cards |
8. Open-Source vs Closed Model Capability Gap
Status: ❌ Declining | Data Quality: Good
Dramatic Gap Convergence (2025)
Epoch AI research from October 2025 found that the capability gap has narrowed dramatically:
| Metric | 2024 Baseline | 2025 Current | Trend Direction | Impact |
|---|---|---|---|---|
| Average lag time | 16 months | 3-6 months | ↓ 70% reduction | Major |
| ECI gap (capability index) | 15-20 points | 7 points | ↓ Rapid convergence | Significant |
| Cost differential | 10-50x | 1.5-3x | ↓ Economic parity approaching | Critical |
| Performance parity domains | Limited | Most benchmarks | ↑ Broad capability matching | Major |
DeepSeek R1 Impact
DeepSeek's R1 release on January 20, 2025 represented a watershed moment:
| Comparison Metric | DeepSeek R1 | OpenAI o1 | Advantage | Cost Impact |
|---|---|---|---|---|
| AIME (math reasoning) | 52.5% | 44.6% | DeepSeek +7.9% | 27x cheaper |
| MATH benchmark | 91.6% | 85.5% | DeepSeek +6.1% | 27x cheaper |
| Training cost | $5.6 million | ≈$150 million | 27x cost advantage | Revolutionary |
| Inference cost | ≈$0.55 per million tokens | ≈$15 per million tokens | 27x operational savings | Market disrupting |
Industry Impact: DeepSeek R1's performance parity with closed models while operating at 1/27th the token cost fundamentally altered competitive dynamics.19
Current Capability Comparison
| Domain | Closed Model Leader | Open Model Leader | Gap Status | Enterprise Impact |
|---|---|---|---|---|
| General reasoning | GPT-5.2, Claude 4.5 | DeepSeek R1, Llama 4 | 3-6 months | Narrowing rapidly |
| Code generation | GPT-5.2-Codex | DeepSeek-Coder-V2 | 6 months | Significant closure |
| Mathematics | o3, Claude 4.5 | DeepSeek R1 | Parity achieved | Open models leading |
| Enterprise tasks (SWE-Bench) | 80%+ (closed) | 65% (open) | 15% gap | Still meaningful |
Adoption Trends
According to a16z research on enterprise AI adoption:
- 41% of enterprises will increase use of open-source models in 2026
- 41% additional will switch from closed to open if performance reaches parity
- Cost considerations increasingly drive adoption over raw performance metrics
Safety Implications
The rapid convergence creates new challenges:
- Reduced barrier to entry for potentially dangerous capabilities
- Limited oversight of open model development and deployment
- Difficulty implementing safeguards across distributed open ecosystem
- Accelerated capability proliferation without centralized risk assessment
9. Lab Safety Team Turnover Rate
Status: ❌ Declining | Data Quality: Poor
OpenAI Safety Team Exodus (2024-2025)
The dissolution of OpenAI's Superalignment team in May 2024 marked a critical inflection point:
Superalignment Team Departures
| Name | Role | Departure Date | Public Criticism | Post-Departure Role |
|---|---|---|---|---|
| Ilya Sutskever | Co-founder, Chief Scientist | May 14, 2024 | None | Stealth startup |
| Jan Leike | Head of Alignment | May 2024 | "Safety culture took a backseat" | Anthropic |
| Daniel Kokotajlo | Safety researcher | April 2024 | "Lost confidence" in company | Independent advocacy |
| Leopold Aschenbrenner | Safety researcher | 2024 | Fired for information sharing | Independent research |
| William Saunders | Safety researcher | 2024 | None | Undisclosed |
Additional Senior Departures (September 2024)
- Mira Murati (CTO, 6 years at OpenAI)
- Bob McGrew (Chief Research Officer)
- Barret Zoph (VP of Research)
- Miles Brundage (Policy Research Head)
- Total documented senior departures: 25+ as of December 202420
Industry-Wide Safety Team Growth
The AI Safety Field Growth Analysis from 2025 found significant expansion:
| Year | Technical AI Safety FTEs | Non-Technical Safety FTEs | Total | Growth Rate |
|---|---|---|---|---|
| 2022 | ≈200 | ≈200 | 400 | Baseline |
| 2025 | ≈600 | ≈500 | 1,100 | 175% increase |
Lab-Specific Growth:
- OpenAI: Grew from 300 to 3,000 employees (10x)
- Anthropic: Grew >3x since 2022
- Google DeepMind: Grew >3x since 202221
Retention Challenges
Jan Leike's Testimony: In his departure statement, he revealed: "Over the past few months my team has been sailing against the wind. Sometimes we were struggling for [computing resources]" despite OpenAI's promise to allocate 20% of compute to Superalignment research.
Structural Issues:
- High external demand for AI safety talent
- Burnout from rapid development pace
- Philosophical disagreements over safety prioritization
- Resource allocation conflicts between safety and product teams
Cross-Industry Safety Criticism (2025)
AI safety researchers from multiple organizations publicly criticized xAIOrganizationxAIComprehensive profile of xAI covering its founding by Elon Musk in 2023, rapid growth to $230B valuation and $3.8B revenue, development of Grok models, and controversial 'truth-seeking' safety appr...Quality: 48/100's safety culture, describing practices as "reckless" and "completely irresponsible" following internal scandals.22
10. Whistleblower Reports from AI Labs
Status: ⚠️ Stable | Data Quality: Poor
Major Whistleblower Cases (2024-2025)
"The OpenAI Files" Investigation
Compiled by the Midas Project and Tech Oversight Project, this comprehensive report represents "the most comprehensive collection to date of documented concerns with governance practices, leadership integrity, and organizational culture at OpenAI."
Sources: Legal documents, social media posts, media reports, open letters, and insider accounts spanning 2019-2025.
Individual Whistleblower Cases
| Name | Company | Public Disclosure Date | Key Allegations | Legal Action |
|---|---|---|---|---|
| Daniel Kokotajlo | OpenAI | April 2024 | "Lost confidence" in safety practices | Restrictive NDA dispute |
| Jan Leike | OpenAI | May 2024 | Safety "took a back seat to shiny products" | None (standard departure) |
| Nine-person group | OpenAI | June 2024 | "Recklessly racing" toward AGI | Open letter format |
Legislative Response (2025)
Federal Level
AI Whistleblower Protection Act: Senate Judiciary Chair Chuck Grassley introduced the bipartisan bill in May 2025, providing:
- Protection for AI security vulnerability disclosure
- Shields against retaliation for reporting violations
- Addresses restrictive severance and NDAs creating "chilling effect"23
State Level
California SB 53: Provides whistleblower protections starting January 1, 2026, but critics note limitations:
- Only covers four types of critical safety incidents
- Three of four types require injury or death has already occurred
- Fourth requires accurate prediction of "catastrophic mass casualty event"24
Structural Barriers
Non-Disparagement Agreements
OpenAI's Practice: Initially conditioned equity vesting (≈$1.7 million for Kokotajlo) on non-criticism agreements. Practice was modified after public backlash and media exposure.
Industry Pattern: The 2025 AI Safety Index found that "only OpenAI has published its full policy, and it did so only after media reports revealed the policy's highly restrictive non-disparagement clauses."
Cross-Lab Transparency Initiative
In early summer 2025, Anthropic and OpenAI agreed to evaluate each other's models using internal misalignment evaluations, representing increased transparency despite competitive pressures.
Data Limitations
Actual whistleblower report frequency remains unknown due to:
- Internal reporting systems with no public disclosure
- Fear of career consequences deterring disclosure
- Restrictive legal agreements suppressing reports
- No centralized tracking mechanism across the industry
Predictive Analysis & Trends
2026 Forecasts
Based on current trajectories, we anticipate:
| Metric | 2026 Prediction | Confidence | Key Drivers |
|---|---|---|---|
| Voluntary Compliance | 45-55% (slight decline) | Medium | Competitive pressure, enforcement gaps |
| RSP Threshold Crossings | 2-3 additional ASL-3 activations | High | Capability acceleration |
| Evaluation Timelines | Further compression to days | High | Release velocity pressure |
| Open-Source Gap | Near parity (0-3 months) | Very High | DeepSeek R1 impact, economic pressure |
| Whistleblower Reports | 3-5 major cases | Medium | New legal protections, industry growth |
Systemic Risk Patterns
Feedback Loop Acceleration: Competitive pressure → shortened evaluation → increased risk → competitive disadvantage for safety-focused labs → further pressure intensification.
Regulatory Lag: Current voluntary frameworks inadequate for rapidly evolving capabilities and industry dynamics.
International Divergence: U.S. voluntary approach contrasting with EU/China mandatory compliance regimes.
Methodology & Data Quality Summary
| Metric Category | Data Quality Score | Primary Limitation | Improvement Needed |
|---|---|---|---|
| Compliance Tracking | 7/10 | Self-reported data | Independent verification |
| Safety Evaluations | 4/10 | Company-controlled disclosure | Mandatory reporting |
| Personnel Changes | 3/10 | Only public departures visible | Industry-wide surveys |
| Technical Capabilities | 8/10 | Benchmark gaming potential | Standardized evaluations |
| Whistleblowing | 2/10 | Structural reporting barriers | Legal protections |
Key Data Gaps
- Internal turnover rates for safety-specific teams
- Detailed evaluation methodologies and pass/fail criteria
- International lab practices beyond U.S./UK companies
- Quantified risk thresholds for deployment decisions
- Standardized safety metrics enabling cross-lab comparison
Key Takeaways
Critical Findings
-
Mixed Compliance Reality: Average 53% compliance with voluntary commitments masks significant variation (17-83%) and systemic weaknesses in critical areas like model weight security
-
Evaluation Time Compression Crisis: Safety testing compressed from months to days at leading labs, with OpenAI reducing o3 evaluation to less than one week despite advanced capabilities
-
Open-Source Convergence Acceleration: DeepSeek R1's January 2025 release achieved performance parity at 1/27th the cost, fundamentally altering competitive dynamics and safety oversight challenges
-
Safety Team Retention Crisis: 25+ senior safety researchers departed OpenAI in 2024, including entire Superalignment team dissolution, indicating systematic cultural or resource allocation issues
-
Transparency Deterioration: Major models released without promised safety documentation (Google Gemini 2.5 Pro, OpenAI GPT-4.1), violating government commitments
Systemic Concerns
Competitive Pressure Override: Evidence suggests commercial competition is systematically overriding safety considerations across multiple metrics simultaneously.
Voluntary Framework Inadequacy: Current self-regulatory approaches appear insufficient for the scale and pace of capability development.
Information Asymmetry: Companies control both risk evaluation design and disclosure, creating inherent conflicts of interest.
Positive Developments
- First confirmed RSP threshold crossing (Anthropic Claude Opus 4 ASL-3) demonstrates policy operationalization
- Claude 4.5 Sonnet achieved 98.7% safety score with <5% harmful compliance rate
- New York RAISE Act and federal AI Whistleblower Protection Act signal regulatory evolution
- Industry safety team growth (1,100 FTEs vs. 400 in 2022) shows resource commitment expansion
Footnotes
-
Future of Life Institute, "2025 AI Safety Index," Summer 2025, https://futureoflife.org/ai-safety-index-summer-2025/ ↩
-
METR, "Common Elements of Frontier AI Safety Policies (December 2025 Update)," December 9, 2025, https://metr.org/blog/2025-12-09-common-elements-of-frontier-ai-safety-policies/ ↩
-
OpenAI Preparedness Framework changelog analysis, April 2025 ↩
-
SaferAI, "Anthropic's Responsible Scaling Policy Update Makes a Step Backwards," 2025, https://www.safer-ai.org/anthropics-responsible-scaling-policy-update-makes-a-step-backwards ↩
-
Apollo Research evaluation methodology studies, 2025 ↩
-
Financial Times, "OpenAI Safety Evaluation Timeline Compression," late 2025 ↩
-
NIST, "Pre-Deployment Evaluation of OpenAI's o1 Model," December 2024, https://www.nist.gov/news-events/news/2024/12/nist-releases-pre-deployment-safety-evaluation-openais-o1-model ↩
-
METR, "AI models can be dangerous before public deployment," November 13, 2024, https://metr.org/blog/2024-11-13-ai-models-can-be-dangerous-before-public-deployment/ ↩
-
HackerOne, "AI Red Teaming | Offensive Testing for AI Models," 2025, https://www.hackerone.com/ai-red-teaming ↩
-
Anthropic, "Jailbreak Challenge Results," 2025, https://www.anthropic.com/news/jailbreak-challenge ↩
-
CISA, "AI Red Teaming: Applying Software TEVV for AI Evaluations," November 2024, https://www.cisa.gov/sites/default/files/2024-11/CISA_AI_Red_Teaming_Guide.pdf ↩
-
UK Parliament Science and Technology Committee, "Open Letter on Google DeepMind Disclosure Delays," 2025 ↩
-
Future of Life Institute, "2025 AI Safety Index," Summer 2025 ↩
-
Davis Wright Tremaine, "New York Enacts RAISE Act for AI Transparency Amid Federal Preemption Debate," December 19, 2025, https://www.dwt.com/blogs/artificial-intelligence-law-advisor/2025/12/new-york-raise-act-ai-safety-rules-developers ↩
-
The White House, "Ensuring a National Policy Framework for Artificial Intelligence," December 11, 2025, https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-national-artificial-intelligence-policy/ ↩
-
Future of Life Institute, "2025 AI Safety Index," Summer 2025 ↩
-
The Verge, "OpenAI Issues 'Code Red' Following Gemini 3 Launch," December 2025 ↩
-
PassionFruit, "GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison," 2025, https://www.getpassionfruit.com/blog/gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison ↩
-
FourWeekMBA, "The Open Model Convergence: How the Frontier Gap Collapsed to 6 Months," 2025, https://fourweekmba.com/the-open-model-convergence-how-the-frontier-gap-collapsed-to-6-months/ ↩
-
Various news sources tracking OpenAI departures, compiled December 2024 ↩
-
LessWrong, "AI Safety Field Growth Analysis 2025," 2025, https://www.lesswrong.com/posts/8QjAnWyuE9fktPRgS/ai-safety-field-growth-analysis-2025 ↩
-
Tech Policy Press, "AI Safety Researchers Criticize xAI Safety Culture," 2025 ↩
-
U.S. Congress, "S.1792 - AI Whistleblower Protection Act," May 2025, https://www.congress.gov/bill/119th-congress/senate-bill/1792/text ↩
-
SF Public Press, "California AI Law Created Illusion of Whistleblower Protections," 2025, https://www.sfpublicpress.org/californias-new-ai-safety-law-created-the-illusion-of-whistleblower-protections/ ↩