Back
Current compute trends
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Epoch AI
Epoch AI's compute trends research is widely cited in AI safety discussions for providing empirical data on the pace of AI progress, helping inform timeline estimates and risk assessments around transformative AI development.
Metadata
Importance: 72/100blog postanalysis
Summary
Epoch AI's analysis of historical trends in compute used for training notable AI systems, identifying three distinct eras: pre-deep learning, deep learning, and large-scale models. The research documents how training compute has grown by roughly 4-5 orders of magnitude since 2010, with a notable shift toward massive compute investment after 2015.
Key Points
- •Training compute has grown approximately 4-5 orders of magnitude since 2010, far outpacing Moore's Law predictions
- •Three distinct eras identified: pre-deep learning (<2010), deep learning (2010-2015), and large-scale models (2015-present)
- •Post-2015 era shows dramatically accelerated compute growth, driven by increased investment from large organizations
- •The analysis provides empirical grounding for forecasting future AI capabilities based on compute scaling
- •Data suggests compute trends are a key driver of AI progress, relevant to both capabilities forecasting and safety timelines
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Risk Activation Timeline Model | Analysis | 66.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202615 KB
_Summary: We have collected a dataset and analysed key trends in the training compute of machine learning models since 1950. We identify three major eras of training compute - the pre-Deep Learning Era, the Deep Learning Era, and the Large-Scale Era. Furthermore, we find that the training compute has grown by a factor of 10 billion since 2010, with a doubling rate of around 5-6 months. See our recent paper, [Compute Trends Across Three Eras of Machine Learning](https://arxiv.org/abs/2202.05924), for more details._
## Introduction
It is well known that progress in machine learning (ML) is driven by three primary factors - algorithms, data, and compute. This makes intuitive sense - the development of algorithms like backpropagation transformed the way that machine learning models were trained, leading to significantly improved efficiency compared to previous optimisation techniques ( [Goodfellow _et al._, 2016](https://www.deeplearningbook.org/contents/mlp.html#pf25); [Rumelhart _et al._, 1986](http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf)). Data has been becoming increasingly available, particularly with the advent of “ [big data](https://en.wikipedia.org/wiki/Big_data)” in recent years. At the same time, progress in computing hardware has been rapid, with increasingly powerful and specialised AI hardware ( [Heim, 2021](https://forum.effectivealtruism.org/s/4yLbeJ33fYrwnfDev/p/YNB39RyJ7iAQKGJvq); [Khan and Mann, 2020](https://cset.georgetown.edu/wp-content/uploads/AI-Chips%E2%80%94What-They-Are-and-Why-They-Matter.pdf)).
What is less obvious is the _relative_ importance of these factors, and what this implies for the future of AI. [Kaplan _et al._ (2020)](https://arxiv.org/abs/2001.08361) studied these developments through the lens of **scaling laws**, identifying three key variables:
- Number of parameters of a machine learning model
- Training dataset size
- Compute required for the final training run of a machine learning model (henceforth referred to as **training compute**)
Trying to understand the relative importance of these is challenging because our theoretical understanding of them is insufficient - instead, we need to take large quantities of data and analyse the resulting trends. Previously, we looked at trends in parameter counts of ML models - in this paper, we try to understand how training compute has evolved over time.
[Amodei and Hernandez (2018)](https://openai.com/blog/ai-and-compute/) laid the groundwork for this, finding a increase in training compute from 2012 to 2018, doubling every 3.4 months. However, this investigation had only around datapoints, and does not include some of the most impressive recent ML models like GPT-3 ( [Brown, 2020](https://arxiv.org/abs/2005.14165)).
Motivated by these problems, we have curated the largest ever dataset containing the training compute of machine learning models, with over 120 datapoints. Using this data, we have drawn several novel insights into the significance of
... (truncated, 15 KB total)Resource ID:
d1c7d3fe408d3988 | Stable ID: NDdkM2Y5M2