Pre-deployment evaluation of Claude 3.5 Sonnet
governmentCredibility Rating
Gold standard. Rigorous peer review, high editorial standards, and strong institutional reputation.
Rating inherited from publication venue: NIST
This is one of the first publicly disclosed government-conducted pre-deployment AI safety evaluations, setting a precedent for how regulatory bodies may assess frontier models before release; relevant to governance, capability evaluation, and red-teaming methodology discussions.
Metadata
Summary
The U.S. and UK AI Safety Institutes jointly conducted pre-deployment safety evaluations of Anthropic's upgraded Claude 3.5 Sonnet, testing biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy. The evaluation used question answering, agent tasks, qualitative probing, and red teaming to benchmark the model against prior versions and competitors. This represents one of the first formal government-led pre-deployment AI safety evaluations made public.
Key Points
- •Joint US-UK government evaluation covering four domains: biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy.
- •Methodologies included question answering, agent tasks, qualitative probing, and red teaming to assess real-world risk potential.
- •Model was benchmarked against prior Claude versions and competing models (OpenAI o1-preview and GPT-4o) to measure relative capability uplift.
- •Findings were shared with Anthropic prior to public release, demonstrating a pre-deployment disclosure and review process.
- •Represents a significant precedent for government-led third-party safety evaluations of frontier AI models before public deployment.
Cited by 5 pages
| Page | Type | Quality |
|---|---|---|
| US AI Safety Institute | Organization | 91.0 |
| AI Safety Institutes (AISIs) | Policy | 69.0 |
| US Executive Order on Safe, Secure, and Trustworthy AI | Policy | 91.0 |
| Bioweapons Risk | Risk | 91.0 |
| Emergent Capabilities | Risk | 61.0 |
Cached Content Preview
Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet | NIST
Skip to main content
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
Lock
A locked padlock
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
https://www.nist.gov/news-events/news/2024/11/pre-deployment-evaluation-anthropics-upgraded-claude-35-sonnet
UPDATES
Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet
The U.S. AI Safety Institute and the UK AI Safety Institute conducted joint pre-deployment testing of Anthropic's latest model
November 19, 2024
Share
Facebook
Linkedin
X.com
Email
Introduction
The U.S. Artificial Intelligence Safety Institute (US AISI) and the UK Artificial Intelligence Safety Institute (UK AISI) conducted a joint pre-deployment evaluation of Anthropic’s latest model – the upgraded Claude 3.5 Sonnet (released October 22, 2024).
The following is a high-level overview of the evaluations conducted, as well as a snapshot of the findings from each domain tested. A more detailed technical report can be found here .
Overview of the Joint Safety Research & Testing Exercise
US AISI and UK AISI conducted testing during a limited period of pre-deployment access to the upgraded Sonnet 3.5 model. Testing was conducted by expert engineers, scientists, and subject matter specialists from staff at both Institutes, and the findings were shared with Anthropic before the model was publicly released.
US AISI and UK AISI ran separate but complementary tests to assess the model’s capabilities across four domains: (1) biological capabilities , (2) cyber capabilities , (3) software and AI development , and (4) safeguard efficacy .
To assess the model’s relative capabilities and evaluate the potential real-world impacts of the upgraded Sonnet 3.5 across these four areas, US AISI and UK AISI compared its performance to a series of similar reference models: the prior version of Anthropic’s Sonnet 3.5, OpenAI’s o1-preview, and OpenAI’s GPT-4o.
These comparisons are intended only to assess the relative capability improvements of the upgraded Sonnet 3.5, in order to improve scientific interpretation of evaluation results.
Methodology
US AISI and UK AISI tested the upgraded Sonnet 3.5 by drawing on a range of techniques including:
Question Answering: The model was asked to correctly answer a series of questions that test knowledge or problem solving on a given topic. Answers were most often graded automatically by another model, checked b
... (truncated, 11 KB total)a0bcc81243f8fbee | Stable ID: YWFlZWU2ZW