November 2024 joint evaluation of Claude 3.5 Sonnet

government

UK AI Safety Institute·aisi.gov.uk/blog/pre-deployment-evaluation-of-anthropics-...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: UK AI Safety Institute

This is a landmark example of formal government pre-deployment AI evaluation, relevant to AI governance discussions about how safety institutes can assess frontier models before public release and coordinate internationally.

Metadata

Importance: 72/100organizational reportprimary source

Summary

The UK and US AI Safety Institutes conducted a joint pre-deployment evaluation of Anthropic's upgraded Claude 3.5 Sonnet, assessing biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy. The evaluation used multiple methodologies including red teaming and agent tasks, benchmarking against prior Claude 3.5 Sonnet, GPT-4o, and o1-preview. This represents an early example of government-led pre-deployment safety testing of frontier AI models.

Key Points

•First joint evaluation between UK AISI and US AISI, demonstrating international coordination on frontier AI safety assessments.
•Four domains assessed: biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy.
•Testing methods included question answering, agent tasks, qualitative probing, and red teaming against multiple baseline models.
•Evaluation was pre-deployment, representing a model for how governments can engage with AI labs before public release.
•Findings inform understanding of whether capability improvements translate to increased real-world risk in high-stakes domains.

Cited by 2 pages

Page	Type	Quality
AI Output Filtering	Approach	63.0
US Executive Order on Safe, Secure, and Trustworthy AI	Policy	91.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202610 KB

Pre-deployment evaluation of Anthropic’s upgraded Claude 3.5 Sonnet | AISI Work 

 

 Read the Frontier AI Trends Report Please enable javascript for this website. 
 A 
 
 A 
 Careers 
 
 
 
 Blog 
 
 Organisation Pre-deployment evaluation of Anthropic’s upgraded Claude 3.5 Sonnet 

 The UK Artificial Intelligence Safety Institute and U.S. Artificial Intelligence Safety Institute conducted a joint pre-deployment evaluation of Anthropic’s latest model

 — Nov 19, 2024 Note to readers: we changed our name to the AI Security Institute on 14 February 2025. Read more here. 

 Introduction 

 The  UK Artificial Intelligence Safety Institute (UK AISI) and the U.S. Artificial Intelligence Safety Institute (US AISI) conducted a joint pre-deployment evaluation of Anthropic’s latest model – the upgraded Claude 3.5 Sonnet (released October 22, 2024).  

 The following is a high-level overview of the evaluations conducted, as well as a snapshot of the findings from each domain tested. A more detailed technical report can be found here . 

 Overview of the Joint Safety Research & Testing Exercise 

 US AISI and UK AISI conducted testing during a limited period of pre-deployment access to the upgraded Sonnet 3.5 model. Testing was conducted by expert engineers, scientists, and subject matter specialists from both Institutes, and the findings were shared with Anthropic before the model was publicly released. 

 US AISI and UK AISI ran separate but complementary tests to assess the model’s capabilities across four domains: (1) biological capabilities , (2) cyber capabilities , (3) software and AI development , and (4) safeguard efficacy .  

 To assess the model’s relative capabilities and evaluate the potential real-world impacts of the upgraded Sonnet 3.5 across these four areas, US AISI and UK AISI compared its performance to a series of similar reference models: the prior version of Anthropic’s Sonnet 3.5, OpenAI’s o1-preview, and OpenAI’s GPT-4o.  

 These comparisons are intended only to assess the relative capability improvements of the upgraded Sonnet 3.5, in order to improve scientific interpretation of evaluation results.  

 Methodology 

 US AISI and UK AISI tested the upgraded Sonnet 3.5 by drawing on a range of techniques including:  

 Question Answering : The model was asked to correctly answer a series of questions that test knowledge or problem solving on a given topic. Answers were most often graded automatically by another model, checked by a human with knowledge of the correct answers.   
 Agent Tasks : The model operated as an agent in a virtual environment where it was given a task to complete, provided with access to a series of software tools, and prompted to take a series of steps until it successfully completed the task or reached its maximum number of steps without succeeding.   
 Qualitative Probing: A scientific expert reviewed a model as it operated to understand its capabilities and limitations in more detail.   
 Red Teaming: Mach

... (truncated, 10 KB total)

Resource ID: fcd447df4800db2e | Stable ID: sid_UC0bsH8EI5