Skip to content
Longterm Wiki
Back

Mastering the game of Go without human knowledge

paper

Author

Yuan Wang

Credibility Rating

5/5
Gold(5)

Gold standard. Rigorous peer review, high editorial standards, and strong institutional reputation.

Rating inherited from publication venue: Nature

AlphaGo Zero demonstrates the power of self-play reinforcement learning to achieve superhuman performance without human knowledge, raising important questions about AI capability development, training stability, and alignment challenges in advanced RL systems.

Paper Details

Citations
2
Methodology
dissertation

Metadata

journal articleprimary source

Summary

This Nature paper introduces AlphaGo Zero, a reinforcement learning algorithm that masters the game of Go without any human data, guidance, or domain knowledge beyond the rules. Unlike the original AlphaGo, which relied on supervised learning from human expert moves, AlphaGo Zero learns entirely through self-play, where a neural network trains itself to predict its own move selections and game outcomes. Starting from scratch, AlphaGo Zero achieved superhuman performance and defeated the previously published champion-defeating AlphaGo 100-0, demonstrating that pure reinforcement learning can discover superhuman strategies without human knowledge.

Cited by 1 page

PageTypeQuality
World Models + PlanningCapability54.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202623 KB
Mastering the game of Go without human knowledge | Nature 
 
 
 

 
 

 
 

 

 
 
 
 

 

 
 
 
 
 
 

 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
 
 
 
 

 
 

 

 

 
 

 
 
 

 
 

 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 

 
 
 

 
 Skip to main content 

 
 
 
 Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
 the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
 Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
 and JavaScript.

 
 

 

 

 
 
 

 
 
 Advertisement

 
 
 
 
 
 
 
 
 
 
 

 
 
 
 

 

 
 
 
 

 

 

 
 
 
 
 
 
 
 
 

 
 
 Subjects

 
 Computational science 
 Computer science 
 Reward 

 
 

 
 
 

 
 

 
 

 
 Abstract

 A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa , superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa , our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

 

 
 
 
 
 
 
 
 
 
 Access through your institution 
 
 
 
 
 
 
 
 Buy or subscribe 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 This is a preview of subscription content, access via your institution 

 
 
 

 

 Access options

 

 
 
 
 
 
 
 
 Access through your institution 
 
 
 
 
 
 

 

 
 
 
 
 
 Access Nature and 54 other Nature Portfolio journals
 

 
 Get Nature+, our best-value online-access subscription
 

 
 
 $32.99 / 30 days 
 

 cancel any time

 
 
 Learn more 
 
 
 
 
 Subscribe to this journal

 
 Receive 51 print issues and online access
 

 
 
 $199.00 per year

 only $3.90 per issue

 
 
 
 
 Learn more 
 
 
 
 Buy this article

 Purchase on SpringerLink 
 Instant access to the full article PDF. 
 USD 39.95 

 Prices may be subject to local taxes which are calculated during chec

... (truncated, 23 KB total)
Resource ID: 47f4c94acf618045 | Stable ID: YTGEQDtfPA