November 2024 joint evaluation of Claude 3.5 Sonnet
governmentCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: UK AI Safety Institute
This is a landmark example of formal government pre-deployment AI evaluation, relevant to AI governance discussions about how safety institutes can assess frontier models before public release and coordinate internationally.
Metadata
Summary
The UK and US AI Safety Institutes conducted a joint pre-deployment evaluation of Anthropic's upgraded Claude 3.5 Sonnet, assessing biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy. The evaluation used multiple methodologies including red teaming and agent tasks, benchmarking against prior Claude 3.5 Sonnet, GPT-4o, and o1-preview. This represents an early example of government-led pre-deployment safety testing of frontier AI models.
Key Points
- •First joint evaluation between UK AISI and US AISI, demonstrating international coordination on frontier AI safety assessments.
- •Four domains assessed: biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy.
- •Testing methods included question answering, agent tasks, qualitative probing, and red teaming against multiple baseline models.
- •Evaluation was pre-deployment, representing a model for how governments can engage with AI labs before public release.
- •Findings inform understanding of whether capability improvements translate to increased real-world risk in high-stakes domains.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| AI Output Filtering | Approach | 63.0 |
| US Executive Order on Safe, Secure, and Trustworthy AI | Policy | 91.0 |
Cached Content Preview
[Read the Frontier AI Trends Report](https://www.aisi.gov.uk/frontier-ai-trends-report)
Please enable javascript for this website.
A
A
[](https://www.aisi.gov.uk/)
[Blog](https://www.aisi.gov.uk/blog)
[Organisation](https://www.aisi.gov.uk/category/organization)
# Pre-deployment evaluation of Anthropic’s upgraded Claude 3.5 Sonnet
The UK Artificial Intelligence Safety Institute and U.S. Artificial Intelligence Safety Institute conducted a joint pre-deployment evaluation of Anthropic’s latest model
—
Nov 19, 2024
_Note to readers: we changed our name to the AI Security Institute on 14 February 2025. Read more_ [_here._](https://www.gov.uk/government/news/tackling-ai-security-risks-to-unleash-growth-and-deliver-plan-for-change)
### **Introduction**
The UK Artificial Intelligence Safety Institute (UK AISI) and the U.S. Artificial Intelligence Safety Institute (US AISI) conducted a joint pre-deployment evaluation of Anthropic’s latest model – the upgraded Claude 3.5 Sonnet (released October 22, 2024).
The following is a high-level overview of the evaluations conducted, as well as a snapshot of the findings from each domain tested. A more detailed technical report can be found [here](https://cdn.prod.website-files.com/663bd486c5e4c81588db7a1d/673b689ec926d8d32e889a8e_UK-US-Testing-Report-Nov-19.pdf).
### **Overview of the Joint Safety Research & Testing Exercise**
US AISI and UK AISI conducted testing during a limited period of pre-deployment access to the upgraded Sonnet 3.5 model. Testing was conducted by expert engineers, scientists, and subject matter specialists from both Institutes, and the findings were shared with Anthropic before the model was publicly released.
US AISI and UK AISI ran separate but complementary tests to assess the model’s capabilities across four domains: (1) **biological capabilities**, (2) **cyber capabilities**, (3) **software and AI development**, and(4) **safeguard efficacy**.
To assess the model’s relative capabilities and evaluate the potential real-world impacts of the upgraded Sonnet 3.5 across these four areas, US AISI and UK AISI compared its performance to a series of similar reference models: the prior version of Anthropic’s Sonnet 3.5, OpenAI’s o1-preview, and OpenAI’s GPT-4o.
These comparisons are intended only to assess the relative capability improvements of the upgraded Sonnet 3.5, in order to improve scientific interpretation of evaluation results.
### **Methodology**
US AISI and UK AISI tested the upgraded Sonnet 3.5 by drawing on a range of techniques including:
- **Question Answering**: The model was asked to correctly answer a series of questions that test knowledge or problem
... (truncated, 11 KB total)fcd447df4800db2e | Stable ID: NjAyOTY2ZD