Mastering the game of Go without human knowledge
paperAuthor
Credibility Rating
Gold standard. Rigorous peer review, high editorial standards, and strong institutional reputation.
Rating inherited from publication venue: Nature
AlphaGo Zero demonstrates the power of self-play reinforcement learning to achieve superhuman performance without human knowledge, raising important questions about AI capability development, training stability, and alignment challenges in advanced RL systems.
Paper Details
Metadata
Summary
This Nature paper introduces AlphaGo Zero, a reinforcement learning algorithm that masters the game of Go without any human data, guidance, or domain knowledge beyond the rules. Unlike the original AlphaGo, which relied on supervised learning from human expert moves, AlphaGo Zero learns entirely through self-play, where a neural network trains itself to predict its own move selections and game outcomes. Starting from scratch, AlphaGo Zero achieved superhuman performance and defeated the previously published champion-defeating AlphaGo 100-0, demonstrating that pure reinforcement learning can discover superhuman strategies without human knowledge.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| World Models + Planning | Capability | 54.0 |
Cached Content Preview
Mastering the game of Go without human knowledge | Nature
Skip to main content
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
Advertisement
Subjects
Computational science
Computer science
Reward
Abstract
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa , superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa , our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
Access through your institution
Buy or subscribe
This is a preview of subscription content, access via your institution
Access options
Access through your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Learn more
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Learn more
Buy this article
Purchase on SpringerLink
Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during chec
... (truncated, 23 KB total)47f4c94acf618045 | Stable ID: YTGEQDtfPA