Skip to content
Longterm Wiki
Back

Updating the Frontier Safety Framework - Google DeepMind

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Google DeepMind

This is an official DeepMind policy document describing their responsible scaling framework; relevant to discussions of industry self-governance, dangerous capability evaluations, and deployment standards for frontier AI models.

Metadata

Importance: 72/100blog postprimary source

Summary

Google DeepMind outlines updates to its Frontier Safety Framework (FSF), a structured approach to evaluating and mitigating risks from highly capable AI models. The framework defines critical capability thresholds that trigger mandatory safety evaluations and containment measures before deployment. It reflects DeepMind's evolving methodology for responsible scaling and model risk governance.

Key Points

  • The FSF establishes capability-based thresholds that determine when enhanced safety evaluations are required before a model can be deployed.
  • Updates reflect lessons learned from applying the framework to frontier models, refining both evaluation criteria and mitigation protocols.
  • The framework covers dangerous capability domains such as CBRN threats, cyberoffense, and autonomous self-replication or self-improvement.
  • Mitigation levels are tiered, with stricter deployment restrictions applied to models that exceed higher capability thresholds.
  • The FSF is intended to be a living document updated as scientific understanding of AI risk and capabilities advances.

Cited by 1 page

PageTypeQuality
Technical AI Safety ResearchCrux66.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202610 KB
[Skip to main content](https://deepmind.google/blog/updating-the-frontier-safety-framework/#page-content)

February 4, 2025
Responsibility & Safety

# Updating the Frontier Safety Framework

Share

![](https://lh3.googleusercontent.com/f-zgW-BoTio7PpHCs7FMH79cSdrNmYSMlCNUomMm5pxgrTulL6jzcb50DexFfvbAweXRVnAaNpgUHnmnhxsnnCo3rdNIbxPqLeSONU17gaG3gjXQ=w1440)

Our next iteration of the FSF sets out stronger security protocols on the path to AGI

AI is a powerful tool that is helping to unlock new breakthroughs and make significant progress on some of the biggest challenges of our time, from climate change to drug discovery. But as its development progresses, advanced capabilities may present new risks.

That’s why we [introduced](https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/) the first iteration of our Frontier Safety Framework last year - a set of protocols to help us stay ahead of possible severe risks from powerful frontier AI models. Since then, we've collaborated with experts in industry, academia, and government to deepen our understanding of the risks, the empirical evaluations to test for them, and the mitigations we can apply. We have also implemented the Framework in our safety and governance processes for evaluating frontier models such as Gemini 2.0. As a result of this work, today we are publishing an updated [Frontier Safety Framework](https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/updating-the-frontier-safety-framework/Frontier%20Safety%20Framework%202.0.pdf).

Key updates to the framework include:

- Security Level recommendations for our Critical Capability Levels (CCLs), helping to identify where the strongest efforts to curb exfiltration risk are needed
- Implementing a more consistent procedure for how we apply deployment mitigations
- Outlining an industry leading approach to deceptive alignment risk

## Recommendations for Heightened Security

Security mitigations help prevent unauthorized actors from exfiltrating model weights. This is especially important because access to model weights allows removal of most safeguards. Given the stakes involved as we look ahead to increasingly powerful AI, getting this wrong could have serious implications for safety and security. Our initial Framework recognised the need for a tiered approach to security, allowing for the implementation of mitigations with varying strengths to be tailored to the risk. This proportionate approach also ensures we get the balance right between mitigating risks and fostering access and innovation.

Since then, we have drawn on [wider research](https://www.rand.org/pubs/research_reports/RRA2849-1.html) to evolve these security mitigation levels and recommend a level for each of our CCLs.\* These recommendations reflect our assessment of the minimum appropriate level of security the field of frontier AI should apply to such models at a CCL. This mapping process helps us isolate where the strongest mitigations are 

... (truncated, 10 KB total)
Resource ID: f232f1723d6802e7 | Stable ID: NjIwNmY5ZW