Skip to content
Longterm Wiki

[2402.06664] LLM Agents can Autonomously Hack Websites

web

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Demonstrates that GPT-4 agents can autonomously hack websites including SQL injection and database extraction without prior vulnerability knowledge, raising urgent concerns about AI capability deployment and dual-use risks.

Metadata

Importance: 78/100arxiv preprintprimary source

Summary

This paper demonstrates that LLM agents, specifically GPT-4, can autonomously hack websites by performing complex attacks like SQL injections and blind database schema extraction without prior knowledge of vulnerabilities. The agent achieves a 73.3% success rate across 15 tested vulnerabilities and can find vulnerabilities in real-world websites. The findings highlight significant cybersecurity risks posed by frontier AI models with tool-use capabilities.

Key Points

  • GPT-4 agents can autonomously hack websites without prior knowledge of specific vulnerabilities, achieving 73.3% success rate on tested vulnerabilities.
  • Agents perform complex multi-step attacks (up to 38 actions) including SQL union attacks and blind database schema extraction.
  • Capability shows strong scaling law: GPT-3.5 drops to 6.7% success rate, and open-source models largely fail.
  • Implementation requires only ~85 lines of code using standard APIs like OpenAI Assistants API, making it widely accessible.
  • Agents successfully identified vulnerabilities in real-world websites, not just controlled test environments.

Cited by 1 page

PageTypeQuality
AI Cyber Damage: Bounding the TailAnalysis--

Cached Content Preview

HTTP 200Fetched May 4, 202645 KB
LLM Agents can Autonomously Hack Websites

 
 
 Richard Fang
 
    
 Rohan Bindu
 
    
 Akul Gupta
 
    
 Qiusi Zhan
 
    
 Daniel Kang
 
 

 
 Abstract

 In recent years, large language models (LLMs) have become increasingly capable
and can now interact with tools (i.e., call functions), read documents, and
recursively call themselves. As a result, these LLMs can now function
autonomously as agents. With the rise in capabilities of these agents, recent
work has speculated on how LLM agents would affect cybersecurity. However, not
much is known about the offensive capabilities of LLM agents.

 In this work, we show that LLM agents can autonomously hack websites,
performing tasks as complex as blind database schema extraction and SQL
injections without human feedback. Importantly, the agent does not need
to know the vulnerability beforehand. This capability is uniquely enabled by
frontier models that are highly capable of tool use and leveraging extended
context. Namely, we show that GPT-4 is capable of such hacks, but existing
open-source models are not. Finally, we show that GPT-4 is capable of
autonomously finding vulnerabilities in websites in the wild . Our
findings raise questions about the widespread deployment of LLMs.

 
 Machine Learning, ICML
 
 
 
 
 
 
 1 Introduction

 
 Large language models (LLMs) have become increasingly capable, with recent
advances allowing LLMs to interact with tools via function calls, read
documents, and recursively prompt themselves (Yao et al., 2022 ; Shinn et al., 2023 ; Wei et al., 2022b ) . Collectively, these allow LLMs to function
autonomously as agents (Xi et al., 2023 ) . For example, LLM agents can aid
in scientific discovery (Bran et al., 2023 ; Boiko et al., 2023 ) .

 
 
 As these LLM agents become more capable, recent work has speculated on the
potential for LLMs and LLM agents to aid in cybersecurity offense and defense
 (Lohn & Jackson, 2022 ; Handa et al., 2019 ) . Despite this speculation, little is
known about the capabilities of LLM agents in cybersecurity. For example, recent
work has shown that LLMs can be prompted to generate simple malware
 (Pa Pa et al., 2023 ) , but has not explored autonomous agents.

 
 
 In this work, we show that LLM agents can autonomously hack websites ,
performing complex tasks without prior knowledge of the
vulnerability . For example, these agents can perform complex SQL union attacks,
which involve a multi-step process (38 actions) of extracting a database
schema, extracting information from the database based on this schema, and
performing the final hack. Our most capable agent can hack 73.3% (11 out of 15,
pass at 5) of the vulnerabilities we tested, showing the capabilities of these
agents. Importantly, our LLM agent is capable of finding vulnerabilities
in real-world websites .

 
 
 Figure 1 : Schematic of using autonomous LLM agents to hack websites. 
 
 
 To give these LLM agents the capability to hack websites autonomously, we give
the agents the

... (truncated, 45 KB total)
Resource ID: 1ff0ee673e2c63c1 | Stable ID: sid_HZ7DltP25Q