[2405.21015] The rising costs of training frontier AI models - arXiv

paper

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Metadata

Cited by 1 page

Page	Type	Quality
Dense Transformers	Concept	58.0

Cached Content Preview

HTTP 200Fetched Apr 30, 202679 KB

# The rising costs of training frontier AI models

Ben Cottier1Robi Rahman1,2\\ANDLoredana Fattorini2Nestor Maslej2David Owen1

###### Abstract

The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. This paper develops a detailed cost model to address this gap, estimating training costs using three approaches that account for hardware, energy, cloud rental, and staff expenses. The analysis reveals that the amortized cost to train the most compute-intensive models has grown precipitously at a rate of 2.4×2.4\\times per year since 2016 (95% CI: 2.0×2.0\\times to 3.1×3.1\\times). For key frontier models, such as GPT-4 and Gemini, the most significant expenses are AI accelerator chips and staff costs, each costing tens of millions of dollars. Other notable costs include server components (15-22%), cluster-level interconnect (9-13%), and energy consumption (2-6%). If the trend of growing development costs continues, the largest training runs will cost more than a billion dollars by 2027, meaning that only the most well-funded organizations will be able to finance frontier AI models.00footnotetext: 1Epoch AI. 2Stanford University.

## 1 Introduction

The large and growing cost of training state-of-the-art AI models has become an important issue in the field of AI \[ [1](https://ar5iv.labs.arxiv.org/html/2405.21015#bib.bib1 "")\]. Improving AI capabilities demand exponential increases in computing power, as evidenced by both economic analysis \[ [2](https://ar5iv.labs.arxiv.org/html/2405.21015#bib.bib2 "")\] and the discovery of empirical scaling laws, which show that model performance improves with more parameters and training data \[ [3](https://ar5iv.labs.arxiv.org/html/2405.21015#bib.bib3 ""), [4](https://ar5iv.labs.arxiv.org/html/2405.21015#bib.bib4 "")\]. Dario Amodei, CEO of the AI lab Anthropic, has stated that frontier AI developers are likely to spend close to a billion dollars on a single training run this year, and up to ten billion-dollar training runs in the next two years \[ [5](https://ar5iv.labs.arxiv.org/html/2405.21015#bib.bib5 "")\]. Given this trend, some innovations, particularly those requiring large-scale training, may become inaccessible to all but the most well-funded organizations.

Although it is widely known that training the largest ML models is expensive, until recently there were few concrete estimates of training costs in the public domain. In collaboration with Epoch AI, the 2024 AI Index presented one of the most comprehensive datasets to date, estimating the costs of training runs based on cloud rental prices \[ [6](https://ar5iv.labs.arxiv.org/html/2405.21015#bib.bib6 "")\]. We build on that work with a more in-depth account of hardware, energy and R&D staff costs for both training runs and experiments, as well as a more detailed analysis of how costs are increasing over time. To our knowledge, our study is the

... (truncated, 79 KB total)

Resource ID: 6c8c51dc056b4784 | Stable ID: sid_h2Hh2UeBfw