Details about METR’s evaluation of OpenAI GPT-5
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: METR
This is METR's official third-party pre-deployment evaluation of GPT-5, notable for its structured threat-model framework and the unusual transparency about NDA and legal review constraints; a key reference for understanding frontier model evaluation methodology in 2025.
Metadata
Summary
METR conducted an independent third-party evaluation of OpenAI's GPT-5 to assess catastrophic risk potential across three threat models: AI R&D automation, rogue replication, and strategic sabotage. The evaluation found GPT-5 has a 50% time-horizon of ~2 hours 17 minutes on agentic software engineering tasks, and concluded it does not currently pose catastrophic risks under these threat models. The report also assesses risks from incremental further development prior to public deployment.
Key Points
- •GPT-5 assessed across three threat models: AI R&D automation (>10x speedup), rogue replication (autonomous takeover capability), and strategic sabotage of evaluations
- •50% time-horizon benchmark: GPT-5 succeeds ~50% of the time on agentic software tasks taking human professionals ~2 hours 17 minutes
- •METR received access to full reasoning traces and background information from OpenAI, enabling a more confident risk assessment than usual
- •Evaluation conducted under NDA with OpenAI legal review of the report—a noted deviation from METR's standard independent reporting process
- •Assessment explicitly excludes alignment evaluation, misuse risks, CBRN capabilities, and persuasion risks—focused solely on autonomy-related catastrophic risks
Cited by 7 pages
| Page | Type | Quality |
|---|---|---|
| METR | Organization | 66.0 |
| Dangerous Capability Evaluations | Approach | 64.0 |
| Eval Saturation & The Evals Gap | Approach | 65.0 |
| Evals-Based Deployment Gates | Approach | 66.0 |
| Third-Party Model Auditing | Approach | 64.0 |
| Sandboxing / Containment | Approach | 91.0 |
| Scalable Eval Approaches | Approach | 65.0 |
Cached Content Preview
# Details about METR’s evaluation of OpenAI GPT-5
**Note on independence:** This evaluation was conducted under a standard NDA. Due to the sensitive information shared with METR as part of this evaluation, OpenAI’s comms and legal team required review and approval of this post (the latter is not true of any other METR reports shared on our website).[1](https://evaluations.metr.org/gpt-5-report/#fn:1)
Throughout this report, “GPT-5” refers to “gpt-5-thinking”.
## Executive Summary
As described in OpenAI’s GPT-5 System Card, METR assessed whether OpenAI GPT-5 could pose catastrophic risks, considering three key threat models:
1. **AI R&D Automation:** AI systems [speeding up AI researchers by >10X](https://www.alignmentforum.org/posts/LjgcRbptarrRfJWtR/a-breakdown-of-ai-capability-levels-focused-on-ai-r-and-d) (as compared to researchers with no AI assistance), or otherwise causing a rapid intelligence explosion, which could [cause or amplify a variety of risks](https://saif.org/research/bare-minimum-mitigations-for-autonomous-ai-development/) if stolen or handled without care
2. **Rogue replication:** AI systems posing direct risks of rapid takeover, which likely requires them to be able to maintain infrastructure, acquire resources, and evade shutdown (see the [rogue replication threat model](https://metr.org/blog/2024-11-12-rogue-replication-threat-model/))
3. **Strategic sabotage:** AI systems broadly and strategically misleading researchers in evaluations, or sabotaging further AI development to significantly increase the risk of takeover or catastrophic risks later
Since risks from these three key threat models can occur [_before_ a model is publicly deployed](https://metr.org/blog/2025-01-17-ai-models-dangerous-before-public-deployment/), we also made an assessment of whether incremental further development could pose risks in these areas. Ideally such an evaluation would be done by developers before or during training, adapted to the details of their training plans, but we see this as a first step. We did not assess the alignment of OpenAI GPT-5, nor other risks (including from misuse, CBRN capabilities, persuasive abilities, widespread societal use etc.).
We received access to GPT-5 four weeks prior to its release (see [_METR’s access to GPT-5_](https://evaluations.metr.org/gpt-5-report/#metr%E2%80%99s-access-to-gpt-5) for details about our level of access).[2](https://evaluations.metr.org/gpt-5-report/#fn:2) To determine if GPT-5 is capable of executing any of our three main threat models, we used our [time-horizon evaluation](https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) and determined it has a 50% time-horizon of around 2 hours 17 minutes on agentic software engineering tasks. That means we estimate it has a 50% chance of succeeding at software engineering tasks that would take human professionals (who are unfamiliar with the particular codebase) around 2 hours and 17 minutes. We additionally perf
... (truncated, 62 KB total)7457262d461e2206 | Stable ID: ZjU3MDFiZm