Can Go AIs be adversarially robust? | FAR.AI

web

FAR AI·far.ai/research/can-go-ais-be-adversarially-robust

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: FAR AI

Part of FAR.AI's research program on adversarial robustness in game-playing AI; companion to the widely-cited 'Adversarial Policies Beat Superhuman Go AIs' paper, examining whether discovered vulnerabilities can be mitigated.

Metadata

Importance: 62/100blog postprimary source

Summary

This FAR.AI research investigates whether Go-playing AI systems like KataGo can be made robust against adversarial attacks, following prior work showing that human-level Go AIs can be defeated by specially crafted adversarial strategies. The research explores potential defenses and their limitations, contributing to broader understanding of adversarial robustness in AI systems.

Key Points

•Examines whether Go AIs can be hardened against adversarial attack strategies that exploit systematic weaknesses rather than superior play
•Builds on prior FAR.AI/MIT work demonstrating that top Go AIs like KataGo can be defeated by adversarial policies invisible to human players
•Tests potential defenses including adversarial training and other robustness techniques against Go-specific attack strategies
•Findings have implications for AI safety beyond games, suggesting deep vulnerabilities in learned systems may be difficult to fully patch
•Illustrates that even superhuman AI systems can have exploitable blind spots, relevant to deployment safety concerns

Cited by 1 page

Page	Type	Quality
FAR AI	Organization	76.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20263 KB

Can Go AIs be adversarially robust? | FAR.AI 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 We updated our website and would love your feedback! 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Events 
 
 
 
 
 
 Events 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Programs 
 
 
 
 
 
 Programs 
 
 
 
 
 
 
 
 
 Blog 
 
 
 
 
 
 About 
 
 
 
 
 
 About 
 
 
 
 
 
 
 
 
 
 Careers Donate 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 All Research 
 / Robustness 
 
 
 Can Go AIs be adversarially robust?

 
 
 
 Full PDF 
 
 
 
 
 Project 
 
 
 
 
 
 
 Source 
 
 
 
 
 Blog 
 
 
 
 
 
 
 
 
 
 
 Citation 
 
 
 
 
 
 
 
 
 
 
 June 18, 2024

 
 
 
 
 
 Tom Tseng 
 
 
 Euan McLean 
 
 
 Kellin Pelrine 
 
 
 Tony Wang 
 
 
 Adam Gleave 
 
 
 
 
 
 
 
 abstract

 
 
 
 
 Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if simple defenses can improve KataGo&#x27;s worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that some of these defenses are able to protect against previously discovered attacks. Unfortunately, we also find that none of these defenses are able to withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even in narrow domains such as Go. For interactive examples of attacks and a link to our codebase, see [goattack.far.ai](goattack.far.ai).

 
 
 
 
 Share on: 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Research

 Our research explores a portfolio 
of high-potential agendas.

 
 
 
 
 
 
 
 
 Events

 Our events bring together 
global leaders in AI.

 
 
 
 
 
 
 
 
 Programs

 Our programs build the field of trustworthy and secure AI

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Subscribe 
 
 
 
 
 
 
 
 
 
 Subscribe to our newsletter 
 
 
 
 

 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 Organization About Team Programs News Search 
 
 
 Events All Events Alignment Workshops Specialized Workshops All Event Recordings 
 
 
 Research All Publications Research Overview 
 
 
 
 Robustness 
 
 
 Interpretability 
 
 
 Model Evaluation 
 
 
 Alignment 
 
 
 
 
 
 Get involved Careers Contact Donate Newsletter 
 
 
 
 
 Financial Reports / 990s Privacy Policy Terms of Service 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze web page traffic and improve our website. We do not and will never sell user data. Read more about our cookie policy on our privacy policy . Please contact us if you have any questions.

 
 
 
 © 2025 FAR AI, Inc. 
 Website by ODW 
 
 
 
 
 
 
 FAR.AI only uses c

... (truncated, 3 KB total)

Resource ID: c28e177e800c705c | Stable ID: sid_9NgVrKU35D