Can Go AIs be adversarially robust? | FAR.AI
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: FAR AI
Part of FAR.AI's research program on adversarial robustness in game-playing AI; companion to the widely-cited 'Adversarial Policies Beat Superhuman Go AIs' paper, examining whether discovered vulnerabilities can be mitigated.
Metadata
Summary
This FAR.AI research investigates whether Go-playing AI systems like KataGo can be made robust against adversarial attacks, following prior work showing that human-level Go AIs can be defeated by specially crafted adversarial strategies. The research explores potential defenses and their limitations, contributing to broader understanding of adversarial robustness in AI systems.
Key Points
- •Examines whether Go AIs can be hardened against adversarial attack strategies that exploit systematic weaknesses rather than superior play
- •Builds on prior FAR.AI/MIT work demonstrating that top Go AIs like KataGo can be defeated by adversarial policies invisible to human players
- •Tests potential defenses including adversarial training and other robustness techniques against Go-specific attack strategies
- •Findings have implications for AI safety beyond games, suggesting deep vulnerabilities in learned systems may be difficult to fully patch
- •Illustrates that even superhuman AI systems can have exploitable blind spots, relevant to deployment safety concerns
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| FAR AI | Organization | 76.0 |
Cached Content Preview
Can Go AIs be adversarially robust? | FAR.AI
We updated our website and would love your feedback!
Events
Events
Programs
Programs
Blog
About
About
Careers Donate
All Research
/ Robustness
Can Go AIs be adversarially robust?
Full PDF
Project
Source
Blog
Citation
June 18, 2024
Tom Tseng
Euan McLean
Kellin Pelrine
Tony Wang
Adam Gleave
abstract
Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if simple defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that some of these defenses are able to protect against previously discovered attacks. Unfortunately, we also find that none of these defenses are able to withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even in narrow domains such as Go. For interactive examples of attacks and a link to our codebase, see [goattack.far.ai](goattack.far.ai).
Share on:
Research
Our research explores a portfolio
of high-potential agendas.
Events
Our events bring together
global leaders in AI.
Programs
Our programs build the field of trustworthy and secure AI
Subscribe
Subscribe to our newsletter
Organization About Team Programs News Search
Events All Events Alignment Workshops Specialized Workshops All Event Recordings
Research All Publications Research Overview
Robustness
Interpretability
Model Evaluation
Alignment
Get involved Careers Contact Donate Newsletter
Financial Reports / 990s Privacy Policy Terms of Service
Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze web page traffic and improve our website. We do not and will never sell user data. Read more about our cookie policy on our privacy policy . Please contact us if you have any questions.
© 2025 FAR AI, Inc.
Website by ODW
FAR.AI only uses coo
... (truncated, 3 KB total)c28e177e800c705c | Stable ID: MjJmYmE4MG