Google DeepMind: Introducing the Frontier Safety Framework

web

Google DeepMind·deepmind.google/blog/introducing-the-frontier-safety-fram...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Google DeepMind

Published May 2024, this is Google DeepMind's formal responsible scaling policy, comparable to Anthropic's RSP and OpenAI's Preparedness Framework; relevant for comparing industry approaches to frontier model governance and safety commitments.

Metadata

Importance: 72/100blog postprimary source

Summary

Google DeepMind's Frontier Safety Framework (FSF) establishes a structured approach to identifying and mitigating potential severe harms from frontier AI models, focusing on 'critical capability levels' that trigger enhanced safety measures. The framework defines evaluation protocols for dangerous capabilities—particularly in CBRN threats and cyberoffense—and outlines mitigation commitments when models approach or exceed these thresholds. It represents DeepMind's operationalization of responsible scaling policies similar to Anthropic's RSP.

Key Points

•Defines 'critical capability levels' (CCLs) as thresholds at which AI capabilities could meaningfully enable catastrophic harm, triggering mandatory safety protocols.
•Focuses on high-risk capability domains including CBRN (chemical, biological, radiological, nuclear) weapons and cyberoffense as primary evaluation targets.
•Commits to enhanced security measures, deployment restrictions, and accelerated safety research when models approach or reach defined capability thresholds.
•Establishes a continuous evaluation cadence, requiring capability assessments before model deployment and at regular intervals during training.
•Represents Google DeepMind's version of a 'responsible scaling policy,' aligning with broader industry efforts to formalize safety commitments for frontier models.

Cited by 1 page

Page	Type	Quality
Scheming	Risk	74.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20267 KB

Introducing the Frontier Safety Framework — Google DeepMind Skip to main content May 17, 2024 Responsibility & Safety Introducing the Frontier Safety Framework

 Anca Dragan, Helen King and Allan Dafoe

 Share Our approach to analyzing and mitigating future risks posed by advanced AI models

 Google DeepMind has consistently pushed the boundaries of AI, developing models that have transformed our understanding of what's possible. We believe that AI technology on the horizon will provide society with invaluable tools to help tackle critical global challenges, such as climate change, drug discovery, and economic productivity. At the same time, we recognize that as we continue to advance the frontier of AI capabilities, these breakthroughs may eventually come with new risks beyond those posed by present-day models.

 Today, we are introducing our Frontier Safety Framework — a set of protocols for proactively identifying future AI capabilities that could cause severe harm and putting in place mechanisms to detect and mitigate them. Our Framework focuses on severe risks resulting from powerful capabilities at the model level, such as exceptional agency or sophisticated cyber capabilities. It is designed to complement our alignment research, which trains models to act in accordance with human values and societal goals, and Google’s existing suite of AI responsibility and safety practices .

 The Framework is exploratory and we expect it to evolve significantly as we learn from its implementation, deepen our understanding of AI risks and evaluations, and collaborate with industry, academia, and government. Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the Framework will help us prepare to address them. We aim to have this initial framework fully implemented by early 2025.

 The framework

 The first version of the Framework announced today builds on our research on evaluating critical capabilities in frontier models, and follows the emerging approach of Responsible Capability Scaling. The Framework has three key components:

 Identifying capabilities a model may have with potential for severe harm. To do this, we research the paths through which a model could cause severe harm in high-risk domains, and then determine the minimal level of capabilities a model must have to play a role in causing such harm. We call these “Critical Capability Levels” (CCLs), and they guide our evaluation and mitigation approach.
 Evaluating our frontier models periodically to detect when they reach these Critical Capability Levels. To do this, we will develop suites of model evaluations, called “early warning evaluations,” that will alert us when a model is approaching a CCL, and run them frequently enough that we have notice before that threshold is reached.
 Applying a mitigation plan when a model passes our early warning evaluations. This should take into account the overall balance of benefits and risks, and 

... (truncated, 7 KB total)

Resource ID: 8c8edfbc52769d52 | Stable ID: sid_niN8zAcmnY