Skip to content
Longterm Wiki
Back

Epoch AI, "How Much Does It Cost to Train Frontier AI Models?

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Epoch AI

Relevant to compute governance discussions and understanding the economic concentration of frontier AI development; useful for policymakers and researchers assessing who can build transformative AI systems.

Metadata

Importance: 72/100blog postanalysis

Summary

Epoch AI analyzes the financial costs of training state-of-the-art AI models, estimating training runs for leading frontier models and projecting how these costs are evolving. The analysis examines compute expenditures, hardware costs, and trends suggesting training costs for top models may reach billions of dollars. This provides crucial empirical grounding for policy and governance discussions around AI development economics.

Key Points

  • Training costs for frontier models like GPT-4 and Gemini Ultra are estimated in the tens to hundreds of millions of dollars range.
  • Costs are driven primarily by compute (GPU/TPU hours), with hardware acquisition and energy being major components.
  • Training expenditures have been roughly doubling every 9-12 months, suggesting billion-dollar training runs may arrive within years.
  • High training costs create significant barriers to entry, concentrating frontier AI development among well-resourced labs and nations.
  • Cost estimates help inform compute governance proposals, export controls, and discussions about who can develop frontier AI.

Cited by 3 pages

PageTypeQuality
Large Language ModelsCapability60.0
Large Language ModelsConcept62.0
Compute ThresholdsConcept91.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202610 KB
## Summary of findings

The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. In [our new paper](https://arxiv.org/abs/2405.21015), we develop a detailed cost model to address this gap, estimating training costs for up to 45 frontier models using three different approaches that account for hardware and energy expenditures, cloud rental costs, and R&D staff expenses, respectively. This work builds upon the cost estimates featured in the [2024 AI Index](https://aiindex.stanford.edu/report).

Our analysis reveals that the amortized hardware and energy cost for the final training run of frontier models has grown rapidly, at a rate of 2.4x per year since 2016 (95% CI: 2.0x to 3.1x). We also estimated a cost breakdown to develop key frontier models such as GPT-4 and Gemini Ultra, including R&D staff costs and compute for experiments. We found that most of the development cost is for the hardware at 47–67%, but R&D staff costs are substantial at 29–49%, with the remaining 2–6% going to energy consumption.

If the trend of growing training costs continues, the largest training runs will cost more than a billion dollars by 2027, suggesting that frontier AI model training will be too expensive for all but the most well-funded organizations.

## Key Results

Our primary approach calculates training costs based on hardware depreciation and energy consumption over the duration of model training. Hardware costs include AI accelerator chips (GPUs or TPUs), servers, and interconnection hardware. We use either disclosures from the developer or credible third-party reporting to identify or estimate the hardware type and quantity and training run duration for a given model. We also estimate the energy consumption of the hardware during the final training run of each model.

Using this method, we estimated the training costs for 45 frontier models (models that were in the top 10 in terms of training compute when they were released) and found that the overall growth rate is 2.4x per year.

Amortized hardware and energy cost to train frontier AI models over time

Cost (2023 USD, log scale)

Regression mean

90% CI of mean

Using estimated cost of TPU

10100100010k100k1M10M100M1BGNMTAlphaGo MasterAlphaGo ZeroAlphaZeroDALL-EGPT-3 175B (davinci)PaLM (540B)GPT-4Gemini 1.0 UltraInflection-22.4x/year2.4x/yearPublication date201620172018201920202021202220232024

![Epoch AI](https://epoch.ai/assets/logo/epoch-full-standard.svg)\|CC-BYepoch.aiDownload graph

Figure 1. Amortized hardware cost plus energy cost for the final training run of frontier models. The selected models are among the top 10 most compute-intensive for their time. Amortized hardware costs are the product of training chip-hours and a depreciated hardware cost, with 23% overhead added for cluster-level networking. Open circles indicate costs which used an estimated production cost of Google TPU hardware.

... (truncated, 10 KB total)
Resource ID: af04d2ff381827f5 | Stable ID: Y2JmZTFlMj