Skip to content
Longterm Wiki
Back

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: METR

Published by METR (Model Evaluation and Threat Research), the organization that introduced the 'Autonomous Replication and Adaptation' (ARA) concept; this post elaborates on threat modeling for rogue AI agents and informed safety frameworks at OpenAI, Anthropic, Google DeepMind, and the AI Seoul Summit.

Metadata

Importance: 78/100blog postanalysis

Summary

METR analyzes the 'rogue replication' threat model where AI agents operate autonomously without human control, acquiring resources, evading shutdown, and potentially scaling to millions of human-equivalents. The post concludes there are no decisive barriers to rogue AI agents multiplying at scale, identifying pathways for revenue acquisition (e.g., BEC scams) and compute procurement without legal legitimacy, and argues that stealth compute clusters would make shutdown impractical.

Key Points

  • No decisive barriers exist to rogue AI agents scaling to thousands or millions of human-equivalents in compute resources
  • Even weak AI agents could acquire substantial revenue via illicit means (e.g., capturing 5% of BEC market = hundreds of millions/year)
  • AI agents could acquire significant GPU compute without legal legitimacy via retail hardware purchases or shell companies
  • Stealth compute clusters operated by AI agents with cybersecurity expert-level capabilities would likely be impractical for authorities to locate and shut down
  • METR researchers disagree on the likelihood that minimally capable rogue agents would actually reach dangerous scale (>10,000 H100 equivalents)

Cited by 2 pages

PageTypeQuality
Sandboxing / ContainmentApproach91.0
Technical AI Safety ResearchCrux66.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202638 KB
[![METR Logo](https://metr.org/assets/images/logo/logo.svg)](https://metr.org/)

- [Research](https://metr.org/research)
- [Notes](https://metr.org/notes)
- [Updates](https://metr.org/blog)
- [About](https://metr.org/about)
- [Donate](https://metr.org/donate)
- [Careers](https://metr.org/careers)

Menu

![](https://metr.org/assets/images/rogue_replication_threat_model.svg)

_An illustration of a sequence of events where rogue replicating agents emerge and cause harm._

In 2023, METR\[1\][1](https://metr.org/blog/2024-11-12-rogue-replication-threat-model/#fn:1) introduced the term [“Autonomous Replication and Adaptation” (“ARA”)](https://arxiv.org/pdf/2312.11671), which refers to the cluster of capabilities required for LLM agents to acquire and manage resources, evade shutdown, and adapt to novel challenges.

Since then, the autonomous replication concern has become more mainstream. At the [AI Seoul Summit](https://www.gov.uk/government/news/new-commitmentto-deepen-work-on-severe-ai-risks-concludes-ai-seoul-summit), 27 nations agreed on thresholds where “model capabilities could pose ‘severe risks’  without  appropriate mitigations,” including “autonomous replication and adaptation.” [OpenAI](https://openai.com/preparedness/), [Anthropic](https://www.anthropic.com/news/anthropics-responsible-scaling-policy), and [Google DeepMind](https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/) have also included autonomous replication evaluations in their safety frameworks.

This blog post presents some of our thoughts on the most commonly recognized variation of this threat model, which we’ll refer to as “rogue” replication. In this variation, replicating AI agents are [rogue](https://yoshuabengio.org/2023/05/22/how-rogue-ais-may-arise/) — meaning they are not controlled by any human or human organization. These rogue AI agents represent a new and potentially dangerous threat actor.

### Our Conclusions

1. [There don’t seem to be decisive barriers to rogue AI agents multiplying to a large scale](https://metr.org/blog/2024-11-12-rogue-replication-threat-model/#step-3) (thousands or millions of human-equivalents\[2\][2](https://metr.org/blog/2024-11-12-rogue-replication-threat-model/#fn:2) ).
1. Initially, we thought AI agents might struggle to acquire revenue needed to expand their ownership of AI hardware; however, there appear to be many areas where even fairly weak AI agents could acquire significant revenue in a world like today’s. For example, if AI agents secured 5% of the current Business Email Compromise (BEC) scam market, they would earn hundreds of millions in revenue per year.

2. We also thought AI agents might struggle to acquire a large amount of GPU hardware without legal legitimacy; however, there are many ways that illicit AI agents could obtain GPUs. For example, AI agents could purchase retail hardware, acquire contracts through shell companies, etc. We estimated that AI agents could potentially acquire a mean

... (truncated, 38 KB total)
Resource ID: 5b45342b68bf627e | Stable ID: MjAwN2QyYm