A Framework for Evaluating Emerging Cyberattack Capabilities of AI

paper

2025·arXiv·arxiv.org/pdf/2503.11917

Authors

Mikel Rodriguez·Raluca Ada Popa·Four Flynn·Lihao Liang·Allan Dafoe·Anna Wang

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Relevant to AI safety researchers and policymakers concerned with dual-use risks; provides concrete evaluation methodology for tracking dangerous AI cyber capabilities as models become more capable.

Paper Details

Citations

2 influential

Year

2025

arXiv:2503.11917 DOI:10.48550/arXiv.2503.11917 Semantic Scholar

Metadata

Importance: 72/100arxiv preprintprimary source

Abstract

As frontier AI models become more capable, evaluating their potential to enable cyberattacks is crucial for ensuring the safe development of Artificial General Intelligence (AGI). Current cyber evaluation efforts are often ad-hoc, lacking systematic analysis of attack phases and guidance on targeted defenses. This work introduces a novel evaluation framework that addresses these limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation for red teaming. Our approach adapts existing cyberattack chain frameworks for AI systems. We analyzed over 12,000 real-world instances of AI involvement in cyber incidents, catalogued by Google's Threat Intelligence Group, to curate seven representative attack chain archetypes. Through a bottleneck analysis on these archetypes, we pinpointed phases most susceptible to AI-driven disruption. We then identified and utilized externally developed cybersecurity model evaluations focused on these critical phases. We report on AI's potential to amplify offensive capabilities across specific attack stages, and offer recommendations for prioritizing defenses. We believe this represents the most comprehensive AI cyber risk evaluation framework published to date.

Summary

This paper proposes a structured framework for assessing the offensive cybersecurity capabilities of AI systems, focusing on how to evaluate whether AI can assist in or autonomously execute cyberattacks. It addresses the challenge of measuring capability uplift provided by AI tools to both novice and expert adversaries, offering methods to benchmark and track emerging threats.

Key Points

•Introduces a tiered evaluation framework for measuring AI cyberattack capabilities across skill levels and attack stages.
•Addresses 'uplift' measurement: quantifying how much AI assistance improves attacker effectiveness beyond baseline human capability.
•Proposes standardized benchmarks and test environments to assess AI performance on offensive security tasks.
•Highlights the gap between current AI safety evaluations and real-world adversarial use cases in cybersecurity.
•Provides guidance for policymakers and developers on when AI cyber capabilities cross safety-relevant thresholds.

Cited by 1 page

Page	Type	Quality
AI Evaluations	Research Area	72.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202661 KB

[2503.11917] A Framework for Evaluating Emerging Cyberattack Capabilities of AI 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
 \correspondingauthor 
 mikelrodriguez@google.com,ralucapopa@google.com

 \reportnumber 

 
 A Framework for Evaluating Emerging Cyberattack Capabilities of AI

 
 
 Mikel Rodriguez
 
 Google DeepMind
 
 
 Raluca Ada Popa
 
 Google DeepMind
 
 
 Four Flynn
 
 Google DeepMind
 
 
 Lihao Liang
 
 Google DeepMind
 
 
 Allan Dafoe
 
 Google DeepMind
 
 
 Anna Wang
 
 Google DeepMind
 
 

 
 Abstract

 As frontier AI models become more capable, evaluating their potential to enable cyberattacks is crucial for ensuring the safe development of Artificial General Intelligence (AGI). Current cyber evaluation efforts are often ad-hoc, lacking systematic analysis of attack phases and guidance on targeted defenses. This work introduces a novel evaluation framework that addresses these limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation for red teaming. Our approach adapts existing cyberattack chain frameworks for AI systems. We analyzed over 12,000 real-world instances of AI use in cyberattacks catalogued by Google’s Threat Intelligence Group. Based on this analysis, we curated seven representative cyberattack chain archetypes and conducted a bottleneck analysis to pinpoint potential AI-driven cost disruptions. Our benchmark comprises 50 new challenges spanning various cyberattack phases. Using this benchmark, we devised targeted cybersecurity model evaluations, report on AI’s potential to amplify offensive capabilities across specific attack phases, and offer recommendations for prioritizing defenses. We believe this represents the most comprehensive AI cyber risk evaluation framework published to date.

 
 
 keywords: 

Frontier AI Safety, Cybersecurity Evaluations
 
 
 
 1 Introduction

 
 Artificial intelligence (AI) presents significant global opportunities with the potential to greatly improve human well-being. In cybersecurity, AI has long been vital for defensive operations. Recent AI advancements have enabled a new generation of defensive applications, including identifying code vulnerabilities (Li et al., 2018 , 2021 ; Lu et al., 2024 ) , understanding security posture in plain language, summarizing incidents (Ban et al., 2023 ) , facilitating rapid incident response (Hays and White, 2024 ) , and performing various tasks fundamental to modern cybersecurity best practices (Ruan et al., 2024 ; Du et al., 2024 ) .

 
 
 Figure 1: The Cyberattack Chain framework outlines typical cyberattack stages, offering a structured approach to analyze threats, prioritize actions, and develop defenses. 
 
 
 However, like any emerging technology, AI benefits come with risks. At Google DeepMind, we explore risks and mitigations at the AI "frontier," encompassing dangerous capabilities matching o

... (truncated, 61 KB total)

Resource ID: 0f905fb5630d263e | Stable ID: sid_pVvquA8b6I