Adversarial Policies Beat Superhuman Go AIs | FAR.AI

web

FAR AI·far.ai/research/adversarial-policies-beat-superhuman-go-ais

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: FAR AI

A key empirical result in AI safety showing that superhuman AI systems can be systematically defeated by adversarial strategies that exploit blind spots, undermining confidence that capability alone implies robustness or safety.

Metadata

Importance: 82/100blog postprimary source

Summary

FAR.AI researchers demonstrate that adversarial policies can reliably defeat superhuman Go AIs (including KataGo) by exploiting unexpected weaknesses, despite the target AI vastly outperforming the adversary in normal play. This work shows that even highly capable AI systems can harbor surprising, exploitable blind spots that humans can identify but the AI cannot defend against.

Key Points

•Adversarial policies were trained to beat KataGo, a superhuman Go AI, by exploiting systematic blind spots rather than outplaying it conventionally.
•The adversarial strategies are recognizable as losing moves to human players, revealing that superhuman performance does not imply robustness.
•Findings challenge assumptions that high capability in a domain implies reliable safety or resistance to adversarial attacks.
•The work has broad implications for AI safety: capable models may have hidden vulnerabilities that are hard to detect through standard evaluation.
•Demonstrates the importance of adversarial testing and red-teaming as part of AI safety evaluation pipelines.

Cited by 1 page

Page	Type	Quality
FAR AI	Organization	76.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20263 KB

Adversarial Policies Beat Superhuman Go AIs | FAR.AI 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 We updated our website and would love your feedback! 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Events 
 
 
 
 
 
 Events 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Programs 
 
 
 
 
 
 Programs 
 
 
 
 
 
 
 
 
 Blog 
 
 
 
 
 
 About 
 
 
 
 
 
 About 
 
 
 
 
 
 
 
 
 
 Careers Donate 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 All Research 
 / Robustness 
 
 
 Adversarial Policies Beat Superhuman Go AIs

 
 
 
 Full PDF 
 
 
 
 
 Project 
 
 
 
 
 
 
 Source 
 
 
 
 
 Blog 
 
 
 
 
 
 
 
 
 
 
 Citation 
 
 
 
 
 
 
 
 
 
 
 January 9, 2023

 
 
 
 
 
 Tony Wang 
 
 
 Adam Gleave 
 
 
 Tom Tseng 
 
 
 Kellin Pelrine 
 
 
 Nora Belrose 
 
 
 Joseph Miller 
 
 
 Michael D. Dennis 
 
 
 Yawen Duan 
 
 
 Viktor Pogrebniak 
 
 
 Sergey Levine 
 
 
 Stuart Russell 
 
 
 
 
 
 
 
 abstract

 
 
 
 
 We attack the state-of-the-art Go-playing AI system, KataGo, by training adversarial policies that play against frozen KataGo victims. Our attack achieves a >99% win rate when KataGo uses no tree-search, and a >77% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo -- in fact, our adversaries are easily beaten by human amateurs. Instead, our adversaries win by tricking KataGo into making serious blunders. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available at [goattack.far.ai](goattack.far.ai).

 
 
 
 
 Share on: 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Research

 Our research explores a portfolio 
of high-potential agendas.

 
 
 
 
 
 
 
 
 Events

 Our events bring together 
global leaders in AI.

 
 
 
 
 
 
 
 
 Programs

 Our programs build the field of trustworthy and secure AI

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Subscribe 
 
 
 
 
 
 
 
 
 
 Subscribe to our newsletter 
 
 
 
 

 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 Organization About Team Programs News Search 
 
 
 Events All Events Alignment Workshops Specialized Workshops All Event Recordings 
 
 
 Research All Publications Research Overview 
 
 
 
 Robustness 
 
 
 Interpretability 
 
 
 Model Evaluation 
 
 
 Alignment 
 
 
 
 
 
 Get involved Careers Contact Donate Newsletter 
 
 
 
 
 Financial Reports / 990s Privacy Policy Terms of Service 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze web page traffic and improve our website. We do not and will never sell user data. Read more about our cookie policy on our privacy policy . Please contact us if you have any questions.

 
 
 
 © 2025 FAR AI, Inc. 
 Website by ODW 
 
 
 
 
 
 
 FAR.AI only uses cookies essential for website functionality and anonymous usage analytics. 
 
 I understand 
 

... (truncated, 3 KB total)

Resource ID: 90d7373e3af4f090 | Stable ID: sid_Uw8kvKkg82