Details about METR’s evaluation of OpenAI GPT-5

web

METR·evaluations.metr.org/gpt-5-report/

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: METR

This is METR's official third-party pre-deployment evaluation of GPT-5, notable for its structured threat-model framework and the unusual transparency about NDA and legal review constraints; a key reference for understanding frontier model evaluation methodology in 2025.

Metadata

Importance: 82/100organizational reportprimary source

Summary

METR conducted an independent third-party evaluation of OpenAI's GPT-5 to assess catastrophic risk potential across three threat models: AI R&D automation, rogue replication, and strategic sabotage. The evaluation found GPT-5 has a 50% time-horizon of ~2 hours 17 minutes on agentic software engineering tasks, and concluded it does not currently pose catastrophic risks under these threat models. The report also assesses risks from incremental further development prior to public deployment.

Key Points

•GPT-5 assessed across three threat models: AI R&D automation (>10x speedup), rogue replication (autonomous takeover capability), and strategic sabotage of evaluations
•50% time-horizon benchmark: GPT-5 succeeds ~50% of the time on agentic software tasks taking human professionals ~2 hours 17 minutes
•METR received access to full reasoning traces and background information from OpenAI, enabling a more confident risk assessment than usual
•Evaluation conducted under NDA with OpenAI legal review of the report—a noted deviation from METR's standard independent reporting process
•Assessment explicitly excludes alignment evaluation, misuse risks, CBRN capabilities, and persuasion risks—focused solely on autonomy-related catastrophic risks

Cited by 7 pages

Page	Type	Quality
METR	Organization	66.0
Dangerous Capability Evaluations	Approach	64.0
Eval Saturation & The Evals Gap	Approach	65.0
Evals-Based Deployment Gates	Approach	66.0
Third-Party Model Auditing	Approach	64.0
Sandboxing / Containment	Approach	91.0
Scalable Eval Approaches	Approach	65.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202657 KB

Details about METR’s evaluation of OpenAI GPT-5 | METR’s Autonomy Evaluation Resources 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 

 
 
 
 
 

 
 
 
 
 
 
 
 
 

 
 Details about METR's evaluation of OpenAI GPT-5

 
 August 7, 2025

 
 Note on independence: This evaluation was conducted under a standard NDA. Due to the sensitive information shared with METR as part of this evaluation, OpenAI’s comms and legal team required review and approval of this post (the latter is not true of any other METR reports shared on our website). 1 

 Throughout this report, “GPT-5” refers to “gpt-5-thinking”.

 Executive Summary

 As described in OpenAI’s GPT-5 System Card, METR assessed whether OpenAI GPT-5 could pose catastrophic risks, considering three key threat models:

 
 AI R&D Automation: AI systems speeding up AI researchers by >10X (as compared to researchers with no AI assistance), or otherwise causing a rapid intelligence explosion, which could cause or amplify a variety of risks if stolen or handled without care

 Rogue replication: AI systems posing direct risks of rapid takeover, which likely requires them to be able to maintain infrastructure, acquire resources, and evade shutdown (see the rogue replication threat model )

 Strategic sabotage: AI systems broadly and strategically misleading researchers in evaluations, or sabotaging further AI development to significantly increase the risk of takeover or catastrophic risks later

 

 Since risks from these three key threat models can occur before a model is publicly deployed , we also made an assessment of whether incremental further development could pose risks in these areas. Ideally such an evaluation would be done by developers before or during training, adapted to the details of their training plans, but we see this as a first step. We did not assess the alignment of OpenAI GPT-5, nor other risks (including from misuse, CBRN capabilities, persuasive abilities, widespread societal use etc.).

 We received access to GPT-5 four weeks prior to its release (see METR’s access to GPT-5 for details about our level of access). 2 To determine if GPT-5 is capable of executing any of our three main threat models, we used our time-horizon evaluation and determined it has a 50% time-horizon of around 2 hours 17 minutes on agentic software engineering tasks. That means we estimate it has a 50% chance of succeeding at software engineering tasks that would take human professionals (who are unfamiliar with the particular codebase) around 2 hours and 17 minutes. We additionally performed various checks and experiments to increase our confidence that we weren’t dramatically underestimating GPT-5’s capabilities. The results of some of these experiments are described in section Time horizon measurement and Strategic Sabotage .

 As a part of our evaluation of GPT-5, OpenAI gave us access to full reasoning traces for some model completions, and we also received a specific requested list of background information, which a

... (truncated, 57 KB total)

Resource ID: 7457262d461e2206 | Stable ID: sid_YwpvvEkpF0