Skip to content
Longterm Wiki
Back

Pre-deployment evaluation of Claude 3.5 Sonnet

government

Credibility Rating

5/5
Gold(5)

Gold standard. Rigorous peer review, high editorial standards, and strong institutional reputation.

Rating inherited from publication venue: NIST

This is one of the first publicly disclosed government-conducted pre-deployment AI safety evaluations, setting a precedent for how regulatory bodies may assess frontier models before release; relevant to governance, capability evaluation, and red-teaming methodology discussions.

Metadata

Importance: 72/100organizational reportprimary source

Summary

The U.S. and UK AI Safety Institutes jointly conducted pre-deployment safety evaluations of Anthropic's upgraded Claude 3.5 Sonnet, testing biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy. The evaluation used question answering, agent tasks, qualitative probing, and red teaming to benchmark the model against prior versions and competitors. This represents one of the first formal government-led pre-deployment AI safety evaluations made public.

Key Points

  • Joint US-UK government evaluation covering four domains: biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy.
  • Methodologies included question answering, agent tasks, qualitative probing, and red teaming to assess real-world risk potential.
  • Model was benchmarked against prior Claude versions and competing models (OpenAI o1-preview and GPT-4o) to measure relative capability uplift.
  • Findings were shared with Anthropic prior to public release, demonstrating a pre-deployment disclosure and review process.
  • Represents a significant precedent for government-led third-party safety evaluations of frontier AI models before public deployment.

Cited by 5 pages

Cached Content Preview

HTTP 200Fetched Mar 15, 202611 KB
Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet | NIST 
 
 
 
 

 

 
 
 
 Skip to main content
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 Official websites use .gov 
 

 A .gov website belongs to an official government organization in the United States.
 

 
 
 
 
 
 
 Secure .gov websites use HTTPS 
 

 A lock ( 
 
 Lock 
 A locked padlock 
 
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
 

 
 
 
 
 
 
 

 
 
 
 
 https://www.nist.gov/news-events/news/2024/11/pre-deployment-evaluation-anthropics-upgraded-claude-35-sonnet

 
 

 

 
 
 
 
 

 

 

 
 
 
 

 
 

 

 
 
 
 
 
 
 
 
 UPDATES 
 

 
 

 
 
 
 Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet

 
 

 
 
 
 The U.S. AI Safety Institute and the UK AI Safety Institute conducted joint pre-deployment testing of Anthropic's latest model

 
 

 
 
 
 November 19, 2024 
 

 
 

 
 
 
 
 
 Share

 
 
 
 
 Facebook 
 
 
 
 
 Linkedin 
 
 
 
 
 X.com 
 
 
 
 
 Email 
 
 
 
 
 
 

 
 

 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 Introduction 

 The U.S. Artificial Intelligence Safety Institute (US AISI) and the UK Artificial Intelligence Safety Institute (UK AISI) conducted a joint pre-deployment evaluation of Anthropic’s latest model – the upgraded Claude 3.5 Sonnet (released October 22, 2024). 

 The following is a high-level overview of the evaluations conducted, as well as a snapshot of the findings from each domain tested. A more detailed technical report can be found here . 

 Overview of the Joint Safety Research & Testing Exercise 

 US AISI and UK AISI conducted testing during a limited period of pre-deployment access to the upgraded Sonnet 3.5 model. Testing was conducted by expert engineers, scientists, and subject matter specialists from staff at both Institutes, and the findings were shared with Anthropic before the model was publicly released. 

 US AISI and UK AISI ran separate but complementary tests to assess the model’s capabilities across four domains: (1) biological capabilities , (2) cyber capabilities , (3) software and AI development , and (4) safeguard efficacy . 

 To assess the model’s relative capabilities and evaluate the potential real-world impacts of the upgraded Sonnet 3.5 across these four areas, US AISI and UK AISI compared its performance to a series of similar reference models: the prior version of Anthropic’s Sonnet 3.5, OpenAI’s o1-preview, and OpenAI’s GPT-4o. 

 These comparisons are intended only to assess the relative capability improvements of the upgraded Sonnet 3.5, in order to improve scientific interpretation of evaluation results. 

 Methodology 

 US AISI and UK AISI tested the upgraded Sonnet 3.5 by drawing on a range of techniques including: 

 Question Answering: The model was asked to correctly answer a series of questions that test knowledge or problem solving on a given topic. Answers were most often graded automatically by another model, checked b

... (truncated, 11 KB total)
Resource ID: a0bcc81243f8fbee | Stable ID: YWFlZWU2ZW