Pre-Deployment evaluation of OpenAI's o1 model
governmentCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: UK AI Safety Institute
This is an official government evaluation report from AISI (UK) and its US counterpart, representing one of the first formal pre-deployment government safety assessments of a frontier model and a key case study in operationalizing AI governance frameworks.
Metadata
Summary
The US and UK AI Safety Institutes jointly conducted a pre-deployment safety evaluation of OpenAI's o1 reasoning model, assessing its capabilities in cyber, biological, and software development domains. The evaluation benchmarked o1 against reference models to identify potential risks before public release. This represents an early example of government-led pre-deployment AI safety testing through formal institute collaboration.
Key Points
- •First major joint evaluation between the US and UK AI Safety Institutes, establishing a model for international government cooperation on AI safety assessments.
- •Tested o1 across high-risk capability domains including cybersecurity, biological threats, and software development to identify uplift risks.
- •Compared o1's performance against multiple reference models to contextualize capability levels and potential for misuse.
- •Evaluation occurred pre-deployment, reflecting a proactive rather than reactive approach to frontier model risk assessment.
- •Represents an operationalization of safety commitments made by frontier AI labs to engage with government safety bodies before release.
Review
Cited by 3 pages
| Page | Type | Quality |
|---|---|---|
| Evals-Based Deployment Gates | Approach | 66.0 |
| International AI Safety Summit Series | Event | 63.0 |
| Third-Party Model Auditing | Approach | 64.0 |
e23f70e673a090c1 | Stable ID: ZGZiYWFjNm