Jack Clark, Anthropic

web

jack-clark.net·jack-clark.net/2024/03/28/what-does-1025-versus-1026-mean/

Written in March 2024 by Anthropic co-founder Jack Clark, this piece provides a concrete technical and economic breakdown of compute thresholds used in major AI regulations, useful for understanding how FLOP-based governance frameworks translate to real-world cost and scope.

Metadata

Importance: 62/100blog postanalysis

Summary

Jack Clark (Anthropic co-founder) analyzes the practical meaning of FLOP-based regulatory thresholds in the US Executive Order (10^26) and EU AI Act (10^25), translating them into dollar costs using H100 and A100 GPU assumptions. He estimates these thresholds correspond to roughly $10M and $104M in compute spend respectively, and discusses the challenges governments face in staffing and interpreting evaluations at these scales.

Key Points

•US Biden EO triggers reporting at 10^26 FLOPs; EU AI Act triggers 'systemic risk' obligations at 10^25 FLOPs — a 10x difference.
•Using H100 GPUs at FP8 precision and 40% efficiency, 10^25 costs ~$10M and 10^26 costs ~$104M in compute (after a 1.5x failure buffer).
•FLOP thresholds are imperfect proxies: efficiency gains, new hardware, and algorithmic improvements can shift what is achievable at a given cost.
•Regulatory regimes assume compute spend correlates with capability and risk, but governments must staff up technical experts to meaningfully interpret model evaluations.
•The gap between 10^25 and 10^26 is meaningful — the EU catches a broader set of frontier labs than the US threshold currently does.

Cited by 1 page

Page	Type	Quality
Compute Monitoring	Approach	69.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202619 KB

What does 10^25 versus 10^26 mean? | Import AI 

 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

 

 
 
 

 

 

 

 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 

 

 

 
 Import AI 

 
 
 

 

 
 
 
 What does 10^25 versus 10^26 mean?

 by Jack Clark 

 
 
 A brief look at what FLOPs-based regulation nets out to  

 Recent AI regulations have defined the trigger points for oversight in terms of the amount of floating point operations dumped into training an AI system. If you’re in America and you’ve trained a model with 10^26 FLOPs, you’re going to spend a lot of time dealing with government agencies. If you’re in Europe and you’ve trained a model with 10^25 FLOPs, you’re going to spend a lot of time dealing with government agencies. 

 More details:
 
In the United States, the recent Biden Executive Order on AI says that general-purpose systems trained with 10^26 FLOPs (or ones predominantly trained on biological sequence data and using a quantity of computing power greater than 10^23) fall under a new reporting requirement that means companies will let the US government know about these systems and also show work on testing these systems. 

In Europe, the recent EU AI Act says that general-purpose systems trained with 10^25 FLOPs have the potential for “systemic risk” and that people who develop these models “are therefore mandated to assess and mitigate risks, report serious incidents, conduct state-of-the-art tests and model evaluations, ensure cybersecurity and provide information on the energy consumption of their models.”

Given how difficult the task of assessing AI systems is, these thresholds matter &#8211; governments will need to staff up people who can interpret the results about models which pass these thresholds. 

 What is the difference between 10^25 versus 10^26 FLOPs in terms of money?
 
Let’s say you wanted to train an AI system &#8211; how much money would you spend on the compute for training the system before you hit one of these thresholds? We can work this out: 

 NVIDIA H100 &#8211; NVIDIA’s latest GPU.
 
 Assumptions: 
Using FP8 precision &#8211; various frontier labs (e.g, Inflection ) have trained using FP8
40% efficiency &#8211; assuming you’ve worked hard to make your training process efficient. E.g., Google claims ~46% for PALM 540B 
$2 per chip hour &#8211; assuming bulk discounts from economies-of-scale.
Training a standard Transformer-based, large generative model.

 10^26 
Flops per chip second = 2000e12* × 0.4 = 8E14
Flops per chip hour = flops per chip s × 60 (seconds per minute) × 60 (minutes per hour) = 2.88E18
chip h = 1e26 / flops per chip h = 34.722M
chip h × $2 = $69.444M 

*3958 TFLOPS (for fp8 with sparsity) on H100 SXM divided by 2 (because the 2x sparsity support generally isn&#8217;t relevant for training), so the right number is 1979e12. But the datasheet doesn&#8217;t have enough information to tell you that; you just have to know!

 10^25 
Flops per chip second = 2000e12 × 0.4 = 8

... (truncated, 19 KB total)

Resource ID: 555d817916b2a488 | Stable ID: sid_YPw2IBhIR2