Updating the Frontier Safety Framework - Google DeepMind

web

Google DeepMind·deepmind.google/discover/blog/updating-the-frontier-safet...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Google DeepMind

This is an official DeepMind policy document describing their responsible scaling framework; relevant to discussions of industry self-governance, dangerous capability evaluations, and deployment standards for frontier AI models.

Metadata

Importance: 72/100blog postprimary source

Summary

Google DeepMind outlines updates to its Frontier Safety Framework (FSF), a structured approach to evaluating and mitigating risks from highly capable AI models. The framework defines critical capability thresholds that trigger mandatory safety evaluations and containment measures before deployment. It reflects DeepMind's evolving methodology for responsible scaling and model risk governance.

Key Points

•The FSF establishes capability-based thresholds that determine when enhanced safety evaluations are required before a model can be deployed.
•Updates reflect lessons learned from applying the framework to frontier models, refining both evaluation criteria and mitigation protocols.
•The framework covers dangerous capability domains such as CBRN threats, cyberoffense, and autonomous self-replication or self-improvement.
•Mitigation levels are tiered, with stricter deployment restrictions applied to models that exceed higher capability thresholds.
•The FSF is intended to be a living document updated as scientific understanding of AI risk and capabilities advances.

Cited by 1 page

Page	Type	Quality
Technical AI Safety Research	Crux	66.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20269 KB

Updating the Frontier Safety Framework — Google DeepMind Skip to main content February 4, 2025 Responsibility & Safety Updating the Frontier Safety Framework

 Share Our next iteration of the FSF sets out stronger security protocols on the path to AGI

 AI is a powerful tool that is helping to unlock new breakthroughs and make significant progress on some of the biggest challenges of our time, from climate change to drug discovery. But as its development progresses, advanced capabilities may present new risks.

 That’s why we introduced the first iteration of our Frontier Safety Framework last year - a set of protocols to help us stay ahead of possible severe risks from powerful frontier AI models. Since then, we've collaborated with experts in industry, academia, and government to deepen our understanding of the risks, the empirical evaluations to test for them, and the mitigations we can apply. We have also implemented the Framework in our safety and governance processes for evaluating frontier models such as Gemini 2.0. As a result of this work, today we are publishing an updated Frontier Safety Framework .

 Key updates to the framework include:

 Security Level recommendations for our Critical Capability Levels (CCLs), helping to identify where the strongest efforts to curb exfiltration risk are needed
 Implementing a more consistent procedure for how we apply deployment mitigations
 Outlining an industry leading approach to deceptive alignment risk
 Recommendations for Heightened Security

 Security mitigations help prevent unauthorized actors from exfiltrating model weights. This is especially important because access to model weights allows removal of most safeguards. Given the stakes involved as we look ahead to increasingly powerful AI, getting this wrong could have serious implications for safety and security. Our initial Framework recognised the need for a tiered approach to security, allowing for the implementation of mitigations with varying strengths to be tailored to the risk. This proportionate approach also ensures we get the balance right between mitigating risks and fostering access and innovation.

 Since then, we have drawn on wider research to evolve these security mitigation levels and recommend a level for each of our CCLs.* These recommendations reflect our assessment of the minimum appropriate level of security the field of frontier AI should apply to such models at a CCL. This mapping process helps us isolate where the strongest mitigations are needed to curtail the greatest risk. In practice, some aspects of our security practices may exceed the baseline levels recommended here due to our strong overall security posture.

 This second version of the Framework recommends particularly high security levels for CCLs within the domain of machine learning research and development (R&D). We believe it will be important for frontier AI developers to have strong security for future scenarios when their models can significantly

... (truncated, 9 KB total)

Resource ID: f232f1723d6802e7 | Stable ID: sid_fD5GdmVfYs