Skip to content
Longterm Wiki
Back

Training Compute Thresholds

web

Published via heim.xyz and tagged with US AISI, this document is relevant to policymakers and researchers interested in how compute-based metrics are used to define regulatory scope for frontier AI models under frameworks like the Biden Executive Order on AI.

Metadata

Importance: 62/100working paperanalysis

Summary

This document examines the use of training compute thresholds as a governance mechanism for regulating advanced AI systems, analyzing how computational resource requirements can serve as proxies for identifying potentially dangerous AI models. It likely addresses methodological considerations for setting appropriate thresholds and their role in AI safety policy frameworks, particularly in the context of US AI Safety Institute initiatives.

Key Points

  • Explores training compute thresholds (e.g., measured in FLOP) as a tractable proxy metric for identifying frontier AI models subject to oversight.
  • Analyzes the technical and policy rationale for using compute as a governance trigger, including its measurability and correlation with model capabilities.
  • Examines limitations and potential workarounds of compute-based thresholds, such as algorithmic efficiency improvements reducing compute needed for dangerous capabilities.
  • Likely connects to regulatory frameworks like executive orders or AISI guidance that use compute thresholds to define reporting or evaluation obligations.
  • Discusses how thresholds should be updated over time as hardware and training techniques evolve.

Cited by 1 page

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Training Compute Thresholds: Features and Functions in AI Regulation

# Lennart Heim∗

# Leonie Koessler

Centre for the Governance of AI Oxford, United Kingdom

Centre for the Governance of AI Oxford, United Kingdom & European New School of Digital Studies Frankfurt (Oder), Germany

# Abstract

Regulators in the US and EU are using thresholds based on training compute—the number of computational operations used in training—to identify general-purpose artificial intelligence (GPAI) models that may pose risks of large-scale societal harm. We argue that training compute currently is the most suitable metric to identify GPAI models that deserve regulatory oversight and further scrutiny. Training compute correlates with model capabilities and risks, is quantifiable, can be measured early in the AI lifecycle, and can be verified by external actors, among other advantageous features. These features make compute thresholds considerably more suitable than other proposed metrics to serve as an initial filter to trigger additional regulatory requirements and scrutiny. However, training compute is an imperfect proxy for risk. As such, compute thresholds should not be used in isolation to determine appropriate mitigation measures. Instead, they should be used to detect potentially risky GPAI models that warrant regulatory oversight, such as through notification requirements, and further scrutiny, such as via model evaluations and risk assessments, the results of which may inform which mitigation measures are appropriate. In fact, this appears largely consistent with how compute thresholds are used today. As GPAI technology and market structures evolve, regulators should update compute thresholds and complement them with other metrics into regulatory review processes.

# Executive Summary

The development and deployment of advanced general-purpose artificial intelligence (GPAI) models, also referred to as “frontier AI models” or “dual-use foundation models”, pose increasing risks of large-scale societal harm (Section 1). Currently, these models develop ever higher capabilities through ever larger training runs, fuelled by ever more computational resources (“training compute”). But higher capabilities also mean higher risks to society, because many capabilities are dual-use (e.g., automated hacking capabilities) and because more capable models can be expected to be used more widely and relied upon more heavily, increasing the stakes if they fail or behave in undesired ways (e.g., producing biased outputs). As a result, regulators are increasingly using training compute thresholds to identify models of potential concern.

“Training compute” refers to the total number of operations a computer needs to perform to train an AI model (Section 2). In recent years, the scale of AI training has grown significantly, with increases in the amount of training data, the number of model parameters, and corresponding increases in the amount of compute required for training 

... (truncated, 98 KB total)
Resource ID: ac160a9e668049de | Stable ID: OTc3M2NhYT