AI models can be dangerous before public deployment

web

METR·metr.org/blog/2025-01-17-ai-models-dangerous-before-publi...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: METR

Published by METR (Model Evaluation and Threat Research), a leading AI safety evaluation organization, this piece is relevant to policy discussions about where in the AI development lifecycle safety obligations should apply.

Metadata

Importance: 68/100blog postanalysis

Summary

METR argues that existing AI safety frameworks are too narrowly focused on pre-deployment testing, neglecting the risks posed by internal model usage during development. The piece contends that internal access, staging environments, and pre-release workflows can themselves be vectors for harm. This challenges the assumption that public deployment is the primary risk threshold requiring oversight.

Key Points

•Safety evaluations focused only on deployment gates miss risks arising from internal model access during development and testing phases.
•Internal use of powerful AI models by developers and testers can expose sensitive systems and create harm pathways before any public release.
•Current industry and regulatory frameworks implicitly treat public deployment as the key risk threshold, which METR argues is insufficient.
•Pre-deployment internal workflows may lack the same scrutiny, access controls, and monitoring applied to publicly released systems.
•Calls for broader safety frameworks that account for the full lifecycle of model development, not just the deployment moment.

Review

This source critically examines the limitations of pre-deployment testing as the primary mechanism for AI safety management. The authors argue that powerful AI models can create substantial risks even before public deployment, including potential model theft, internal misuse, and autonomous pursuit of unintended goals. By focusing exclusively on testing before public release, current safety frameworks fail to address critical risks that emerge during model development, training, and internal usage. The recommended approach involves a more comprehensive risk management strategy that emphasizes earlier capability testing, robust internal monitoring, model weight security, and responsible transparency. The authors suggest that labs should forecast potential model capabilities, implement stronger security measures, and establish clear policies for risk mitigation throughout the entire AI development process. This approach recognizes that powerful AI systems are fundamentally different from traditional products and require a more nuanced, lifecycle-based governance regime that prioritizes safety at every stage of development.

Cached Content Preview

HTTP 200Fetched Apr 7, 202611 KB

AI models can be dangerous before public deployment - METR 

 
 
 
 
 

 

 
 

 

 
 

 

 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 

 

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 Research 
 

 
 
 Notes 
 

 
 
 Updates 
 

 
 
 About 
 

 
 
 Donate 
 

 
 
 Careers 
 

 

 
 
 
 Search
 
 
 
 
 

 
 
 

 
 
 

 
 
 

 

 
 
 
 
 
 
 
 
 
 
 
 
 -->
 
 
 
 

 
 
 
 
 
 
 
 Research 
 

 
 
 
 
 
 
 Notes 
 

 
 
 
 
 
 
 Updates 
 

 
 
 
 
 
 
 About 
 

 
 
 
 
 
 
 Donate 
 

 
 
 
 
 
 
 Careers 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 Menu 
 
 
 

 
 AI models can be dangerous before public deployment 
 
 
 
 
 
 
 DATE

 January 17, 2025 
 
 
 
 
 SHARE

 
 
 Copy Link
 
 
 
 Citation
 
 
 
 
 BibTeX Citation 
 &times; 
 
 
 @misc { ai-models-can-be-dangerous-before-public-deployment , 
 title = {AI models can be dangerous before public deployment} , 
 author = {METR} , 
 howpublished = {\url{https://metr.org/blog/2025-01-17-ai-models-dangerous-before-public-deployment/}} , 
 year = {2025} , 
 month = {01} , 
 } 
 
 Copy 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 Many frontier AI safety policies from scaling labs (e.g. OpenAI’s Preparedness Framework, Google DeepMind’s Frontier Safety Framework, etc.), as well as past work by third party evaluators including UK AISI , Apollo Research , and METR , focus on pre-deployment testing – ensuring that the AI model is safe and that the lab has sufficient security before the lab deploys the model to the public.

 Such pre-deployment safety evaluations are standard for a wide variety of products across many industries, where the primary risk of the product is to the consumer (see, for example, the crash testing conducted on cars, choking hazard testing for children’s toys, or the various clinical trials for medical devices). A pre-deployment testing–centric framework makes sense for AI development if AI is analogous to such products, and the majority of AI risks come from malicious end-users or mass adoption. 1 

 But unlike most products, possessing or internally using a powerful AI can create externalities that pose large risks to the public, including:

 
 
 Model theft and misuse by motivated actors. In the wrong hands, powerful models can empower people to do dangerous things, and absent strong security it’s tempting for malicious actors to steal model weights or algorithmic secrets (to build their own models) and use them to do harm. Pre-deployment testing does little to address harms from model theft. 2 

 

 
 Catastrophic misuse resulting from internal use. Employees at labs may misuse the AI model for ideological or practical reasons, and society as a whole probably does not want a few individuals to decide how to use incredibly powerful AI systems in secret. Pre-deployment testing, if it occurs after internal usage, does nothing to prevent internal misuse.

 

 
 Powerful AI pursuing unintended and undesirable goals. AI agents may autonomously pursue misaligned or unintended goals without direct h

... (truncated, 11 KB total)

Resource ID: 199324674d21062d | Stable ID: sid_2eBuTJ7qBS