Epoch AI, "How Much Does It Cost to Train Frontier AI Models?

web

Epoch AI·epoch.ai/blog/how-much-does-it-cost-to-train-frontier-ai-...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Epoch AI

Relevant to compute governance discussions and understanding the economic concentration of frontier AI development; useful for policymakers and researchers assessing who can build transformative AI systems.

Metadata

Importance: 72/100blog postanalysis

Summary

Epoch AI analyzes the financial costs of training state-of-the-art AI models, estimating training runs for leading frontier models and projecting how these costs are evolving. The analysis examines compute expenditures, hardware costs, and trends suggesting training costs for top models may reach billions of dollars. This provides crucial empirical grounding for policy and governance discussions around AI development economics.

Key Points

•Training costs for frontier models like GPT-4 and Gemini Ultra are estimated in the tens to hundreds of millions of dollars range.
•Costs are driven primarily by compute (GPU/TPU hours), with hardware acquisition and energy being major components.
•Training expenditures have been roughly doubling every 9-12 months, suggesting billion-dollar training runs may arrive within years.
•High training costs create significant barriers to entry, concentrating frontier AI development among well-resourced labs and nations.
•Cost estimates help inform compute governance proposals, export controls, and discussions about who can develop frontier AI.

Cited by 3 pages

Page	Type	Quality
Large Language Models	Capability	60.0
Large Language Models	Concept	62.0
Compute Thresholds	Concept	91.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20269 KB

How much does it cost to train frontier AI models? | Epoch AI 

 
 
 
 

 

 
 

 
 
 Summary of findings

 The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. In our new paper , we develop a detailed cost model to address this gap, estimating training costs for up to 45 frontier models using three different approaches that account for hardware and energy expenditures, cloud rental costs, and R&D staff expenses, respectively. This work builds upon the cost estimates featured in the 2024 AI Index .

 Our analysis reveals that the amortized hardware and energy cost for the final training run of frontier models has grown rapidly, at a rate of 2.4x per year since 2016 (95% CI: 2.0x to 3.1x). We also estimated a cost breakdown to develop key frontier models such as GPT-4 and Gemini Ultra, including R&D staff costs and compute for experiments. We found that most of the development cost is for the hardware at 47–67%, but R&D staff costs are substantial at 29–49%, with the remaining 2–6% going to energy consumption.

 If the trend of growing training costs continues, the largest training runs will cost more than a billion dollars by 2027, suggesting that frontier AI model training will be too expensive for all but the most well-funded organizations.

 
 Key Results

 Our primary approach calculates training costs based on hardware depreciation and energy consumption over the duration of model training. Hardware costs include AI accelerator chips (GPUs or TPUs), servers, and interconnection hardware. We use either disclosures from the developer or credible third-party reporting to identify or estimate the hardware type and quantity and training run duration for a given model. We also estimate the energy consumption of the hardware during the final training run of each model.

 Using this method, we estimated the training costs for 45 frontier models (models that were in the top 10 in terms of training compute when they were released) and found that the overall growth rate is 2.4x per year.

 Enable JavaScript to see an interactive visualization.

 Figure 1. Amortized hardware cost plus energy cost for the final training run of frontier models. The selected models are among the top 10 most compute-intensive for their time. Amortized hardware costs are the product of training chip-hours and a depreciated hardware cost, with 23% overhead added for cluster-level networking. Open circles indicate costs which used an estimated production cost of Google TPU hardware. These costs are generally more uncertain than the others, which used actual price data rather than estimates.

 As an alternative approach, we also calculate the cost to train these models in the cloud using rented hardware. This method is very simple to calculate because cloud providers charge a flat rate per chip-hour, and energy and interconnection costs are factored into the prices. Howe

... (truncated, 9 KB total)

Resource ID: af04d2ff381827f5 | Stable ID: sid_XswxBEb3Uf