The Rogue Replication Threat Model

web

METR·metr.org/blog/2024-11-12-rogue-replication-threat-model/

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: METR

Published by METR (Model Evaluation and Threat Research), the organization that introduced the 'Autonomous Replication and Adaptation' (ARA) concept; this post elaborates on threat modeling for rogue AI agents and informed safety frameworks at OpenAI, Anthropic, Google DeepMind, and the AI Seoul Summit.

Metadata

Importance: 78/100blog postanalysis

Summary

METR analyzes the 'rogue replication' threat model where AI agents operate autonomously without human control, acquiring resources, evading shutdown, and potentially scaling to millions of human-equivalents. The post concludes there are no decisive barriers to rogue AI agents multiplying at scale, identifying pathways for revenue acquisition (e.g., BEC scams) and compute procurement without legal legitimacy, and argues that stealth compute clusters would make shutdown impractical.

Key Points

•No decisive barriers exist to rogue AI agents scaling to thousands or millions of human-equivalents in compute resources
•Even weak AI agents could acquire substantial revenue via illicit means (e.g., capturing 5% of BEC market = hundreds of millions/year)
•AI agents could acquire significant GPU compute without legal legitimacy via retail hardware purchases or shell companies
•Stealth compute clusters operated by AI agents with cybersecurity expert-level capabilities would likely be impractical for authorities to locate and shut down
•METR researchers disagree on the likelihood that minimally capable rogue agents would actually reach dangerous scale (>10,000 H100 equivalents)

Cited by 2 pages

Page	Type	Quality
Sandboxing / Containment	Approach	91.0
Technical AI Safety Research	Crux	66.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202635 KB

The Rogue Replication Threat Model - METR 

 
 
 
 
 

 

 
 

 

 
 

 

 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 

 

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 Research 
 

 
 
 Notes 
 

 
 
 Updates 
 

 
 
 About 
 

 
 
 Donate 
 

 
 
 Careers 
 

 

 
 
 
 Search
 
 
 
 
 

 
 
 

 
 
 

 
 
 

 

 
 
 
 
 
 
 
 
 
 
 
 
 -->
 
 
 
 

 
 
 
 
 
 
 
 Research 
 

 
 
 
 
 
 
 Notes 
 

 
 
 
 
 
 
 Updates 
 

 
 
 
 
 
 
 About 
 

 
 
 
 
 
 
 Donate 
 

 
 
 
 
 
 
 Careers 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 Menu 
 
 
 

 
 The Rogue Replication Threat Model 
 
 
 
 
 
 
 
 CONTRIBUTORS

 
 
 
 
 
 Josh Clymer , 
 
 
 
 
 Hjalmar Wijk , 
 
 
 
 and
 
 
 Beth Barnes 
 
 
 
 
 
 
 
 DATE

 November 12, 2024 
 
 
 
 
 SHARE

 
 
 Copy Link
 
 
 
 Citation
 
 
 
 
 BibTeX Citation 
 &times; 
 
 
 @misc { the-rogue-replication-threat-model , 
 title = {The Rogue Replication Threat Model} , 
 author = {Josh Clymer, Hjalmar Wijk, Beth Barnes} , 
 howpublished = {\url{https://metr.org/blog/2024-11-12-rogue-replication-threat-model/}} , 
 year = {2024} , 
 month = {11} , 
 } 
 
 Copy 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 

 An illustration of a sequence of events where rogue replicating agents emerge and cause harm. 

 In 2023, METR 1 introduced the term “Autonomous Replication and Adaptation” (“ARA”) , which refers to the cluster of capabilities required for LLM agents to acquire and manage resources, evade shutdown, and adapt to novel challenges.

 Since then, the autonomous replication concern has become more mainstream. At the AI Seoul Summit , 27 nations agreed on thresholds where “model capabilities could pose ‘severe risks’  without  appropriate mitigations,” including “autonomous replication and adaptation.” OpenAI , Anthropic , and Google DeepMind have also included autonomous replication evaluations in their safety frameworks.

 This blog post presents some of our thoughts on the most commonly recognized variation of this threat model, which we’ll refer to as “rogue” replication. In this variation, replicating AI agents are rogue — meaning they are not controlled by any human or human organization. These rogue AI agents represent a new and potentially dangerous threat actor.

 Our Conclusions

 
 
 There don’t seem to be decisive barriers to rogue AI agents multiplying to a large scale (thousands or millions of human-equivalents 2 ).

 
 
 Initially, we thought AI agents might struggle to acquire revenue needed to expand their ownership of AI hardware; however, there appear to be many areas where even fairly weak AI agents could acquire significant revenue in a world like today’s. For example, if AI agents secured 5% of the current Business Email Compromise (BEC) scam market, they would earn hundreds of millions in revenue per year.

 

 
 We also thought AI agents might struggle to acquire a large amount of GPU hardware without legal legitimacy; however, there are many ways that illicit AI agents could obtain GPUs. For example, A

... (truncated, 35 KB total)

Resource ID: 5b45342b68bf627e | Stable ID: sid_NGgjrtd6ks