Google DeepMind: Introducing the Frontier Safety Framework
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Google DeepMind
Published May 2024, this is Google DeepMind's formal responsible scaling policy, comparable to Anthropic's RSP and OpenAI's Preparedness Framework; relevant for comparing industry approaches to frontier model governance and safety commitments.
Metadata
Summary
Google DeepMind's Frontier Safety Framework (FSF) establishes a structured approach to identifying and mitigating potential severe harms from frontier AI models, focusing on 'critical capability levels' that trigger enhanced safety measures. The framework defines evaluation protocols for dangerous capabilities—particularly in CBRN threats and cyberoffense—and outlines mitigation commitments when models approach or exceed these thresholds. It represents DeepMind's operationalization of responsible scaling policies similar to Anthropic's RSP.
Key Points
- •Defines 'critical capability levels' (CCLs) as thresholds at which AI capabilities could meaningfully enable catastrophic harm, triggering mandatory safety protocols.
- •Focuses on high-risk capability domains including CBRN (chemical, biological, radiological, nuclear) weapons and cyberoffense as primary evaluation targets.
- •Commits to enhanced security measures, deployment restrictions, and accelerated safety research when models approach or reach defined capability thresholds.
- •Establishes a continuous evaluation cadence, requiring capability assessments before model deployment and at regular intervals during training.
- •Represents Google DeepMind's version of a 'responsible scaling policy,' aligning with broader industry efforts to formalize safety commitments for frontier models.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Scheming | Risk | 74.0 |
Cached Content Preview
[Skip to main content](https://deepmind.google/blog/introducing-the-frontier-safety-framework/#page-content)
May 17, 2024
Responsibility & Safety
# Introducing the Frontier Safety Framework
Anca Dragan, Helen King and Allan Dafoe
Share

Our approach to analyzing and mitigating future risks posed by advanced AI models
Google DeepMind has consistently pushed the boundaries of AI, developing models that have transformed our understanding of what's possible. We believe that AI technology on the horizon will provide society with invaluable tools to help tackle critical global challenges, such as climate change, drug discovery, and economic productivity. At the same time, we recognize that as we continue to advance the frontier of AI capabilities, these breakthroughs may eventually come with new risks beyond those posed by present-day models.
Today, we are introducing our [Frontier Safety Framework](https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/introducing-the-frontier-safety-framework/fsf-technical-report.pdf) — a set of protocols for proactively identifying future AI capabilities that could cause severe harm and putting in place mechanisms to detect and mitigate them. Our Framework focuses on severe risks resulting from powerful capabilities at the model level, such as exceptional agency or sophisticated cyber capabilities. It is designed to complement our alignment research, which trains models to act in accordance with human values and societal goals, and Google’s existing suite of AI responsibility and safety [practices](https://ai.google/responsibility/principles/).
The Framework is exploratory and we expect it to evolve significantly as we learn from its implementation, deepen our understanding of AI risks and evaluations, and collaborate with industry, academia, and government. Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the Framework will help us prepare to address them. We aim to have this initial framework fully implemented by early 2025.
## The framework
The first version of the Framework announced today builds on our [research](https://deepmind.google/discover/blog/an-early-warning-system-for-novel-ai-risks/) on [evaluating](https://arxiv.org/abs/2403.13793) critical capabilities in frontier models, and follows the emerging approach of [Responsible Capability Scaling.](https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safety) The Framework has three key components:
1. **Identifying capabilities a model may have with potential for severe harm.** To do this, we research the paths through which a model could cause severe harm in high-risk domains, and then determine the minimal level of capabilities a model must ha
... (truncated, 8 KB total)8c8edfbc52769d52 | Stable ID: MWQzOTZiZW