Skip to content
Longterm Wiki
Back

Competition-level code generation with AlphaCode

paper

Authors

Yujia Li·David Choi·Junyoung Chung·Nate Kushman·Julian Schrittwieser·Rémi Leblond·Tom Eccles·James Keeling·Felix Gimeno·Agustin Dal Lago·Thomas Hubert·Peter Choy·Cyprien de Masson d'Autume·Igor Babuschkin·Xinyun Chen·Po-Sen Huang·Johannes Welbl·Sven Gowal·Alexey Cherepanov·James Molloy·Daniel J. Mankowitz·Esme Sutherland Robson·Pushmeet Kohli·Nando de Freitas·Koray Kavukcuoglu·Oriol Vinyals

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A landmark DeepMind paper demonstrating that large language models can solve competitive programming problems requiring non-trivial algorithmic reasoning, relevant to tracking frontier AI capabilities in code generation and automated software development.

Paper Details

Citations
662
176 influential
Year
2022
Methodology
peer-reviewed
Categories
Science

Metadata

Importance: 72/100arxiv preprintprimary source

Abstract

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.

Summary

AlphaCode is DeepMind's system for generating solutions to competitive programming problems requiring deep algorithmic reasoning, achieving an average ranking in the top 54.3% on Codeforces competitions with 5,000+ participants. Success depends on a high-quality training dataset, large transformer architectures, and a large-scale sampling-and-filtering approach that generates many candidate solutions and selects the best based on program behavior.

Key Points

  • Achieves top 54.3% average ranking on Codeforces competitive programming contests, a significant milestone for AI code generation on complex reasoning tasks.
  • Uses large-scale sampling (generating millions of candidates) followed by filtering based on test case behavior to reduce submissions to a tractable set.
  • Training on a carefully curated, high-quality competitive programming dataset was critical; data quality mattered as much as model scale.
  • Demonstrates that transformer-based LLMs can go beyond simple instruction-to-code translation to perform genuine algorithmic problem-solving.
  • Raises AI safety-relevant questions about the pace of capability gains in code generation and potential for automated software development at scale.

Cited by 1 page

PageTypeQuality
Autonomous CodingCapability63.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
\\pdftrailerid

redacted
\\svgsetupinkscapelatex=false

\\correspondingauthoryujiali@deepmind.com, davidhchoi@deepmind.com, vinyals@deepmind.com

\*\*affiliationtext: Joint first authors

# Competition-Level Code Generation with AlphaCode

Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
Rémi Leblond
Tom Eccles
James Keeling
Felix Gimeno
Agustin Dal Lago
Thomas Hubert
Peter Choy
Cyprien de Masson d’Autume
Igor Babuschkin
Xinyun Chen
Po-Sen Huang
Johannes Welbl
Sven Gowal
Alexey Cherepanov
James Molloy
Daniel J. Mankowitz
Esme Sutherland Robson
Pushmeet Kohli
Nando de Freitas
Koray Kavukcuoglu
Oriol Vinyals

###### Abstract

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging.
Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code.
For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging.
To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants.
We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.

\\etocsettocstyle\\etocsetnexttocdepth

subsection
\\localtableofcontents

### 1 Introduction

Computer programming has emerged as a general-purpose problem-solving tool throughout science, industry, and daily life. As part of this growth, there has been continuously increasing demand for tools that can make programmers more productive (Matsakis and Klock, [2014](https://ar5iv.labs.arxiv.org/html/2203.07814#bib.bib49 "")), or make programming and programming education more accessible (Resnick et al., [2009](https://ar5iv.labs.arxiv.org/html/2203.07814#bib.bib61 "")). Developing AI systems that can effectively model and understand code can transform these tools and the way we interact with them. Systems that can generate code are not only useful, but also stepping stones that can lead to greater understanding of AI and how it relates to programming.

Ge

... (truncated, 98 KB total)
Resource ID: 2137eaa69f74f139 | Stable ID: ZTg4MmFjOT