METR: Common Elements of Frontier AI Safety Policies
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: METR
Published by METR, the organization that pioneered the concept of frontier safety policies in 2023; this document serves as a comparative reference for understanding how leading AI labs structure their safety commitments and where policy convergence is emerging.
Metadata
Summary
METR analyzes the common structural elements across frontier AI safety policies published by major AI companies, identifying shared frameworks around capability thresholds, model evaluations, weight security, deployment mitigations, and accountability mechanisms. The December 2025 version covers twelve companies including Anthropic, OpenAI, Google DeepMind, Meta, and others, and incorporates references to the EU AI Act's General-Purpose AI Code of Practice and California's Senate Bill 53.
Key Points
- •Twelve major AI companies have published frontier AI safety policies, including Anthropic, OpenAI, Google DeepMind, Meta, Microsoft, Amazon, xAI, and NVIDIA.
- •All policies share capability thresholds (e.g., bioweapons, cyberattacks, autonomous replication) that trigger escalating safety measures.
- •Common commitments include model weight security, deployment safeguards, and conditions for halting development if mitigations are insufficient.
- •Policies emphasize eliciting full model capabilities during evaluations conducted before deployment, during training, and post-deployment.
- •Emerging regulatory frameworks like the EU AI Act and California SB53 are increasingly referenced alongside voluntary corporate commitments.
Cited by 7 pages
| Page | Type | Quality |
|---|---|---|
| Situational Awareness | Capability | 67.0 |
| AI Safety Culture Equilibrium Model | Analysis | 65.0 |
| Corporate AI Safety Responses | Approach | 68.0 |
| Eval Saturation & The Evals Gap | Approach | 65.0 |
| Third-Party Model Auditing | Approach | 64.0 |
| Responsible Scaling Policies | Approach | 62.0 |
| AI Safety Cases | Approach | 91.0 |
Cached Content Preview
[](https://metr.org/)
- [Research](https://metr.org/research)
- [Notes](https://metr.org/notes)
- [Updates](https://metr.org/blog)
- [About](https://metr.org/about)
- [Donate](https://metr.org/donate)
- [Careers](https://metr.org/careers)
Menu
# Common Elements of Frontier AI Safety Policies
# Summary
A number of developers of large foundation models have committed to corporate protocols that lay out how they will evaluate their models for severe risks and mitigate these risks with information security measures, deployment safeguards, and accountability practices. Beginning in September of 2023, several AI companies began to voluntarily publish these protocols. In May of 2024, sixteen companies agreed to do so as part of the Frontier AI Safety Commitments at the AI Seoul Summit, with an additional four companies joining since then. Currently, twelve companies [have published](https://metr.org/fsp) frontier AI safety policies: Anthropic, OpenAI, Google DeepMind, Magic, Naver, Meta, G42, Cohere, Microsoft, Amazon, xAI, and NVIDIA.
This December 2025 version contains references to some developers’ updated frontier AI safety policies. In addition, it also mentions relevant guidance from the EU AI Act’s [General-Purpose AI Code of Practice](https://ec.europa.eu/newsroom/dae/redirection/document/118119) and California’s [Senate Bill 53](https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB53).\[1\][1](https://metr.org/common-elements#fn:1)
In our original report from August 2024, we noted how each policy studied made use of _capability thresholds_, such as the potential for AI models to facilitate biological weapons development or cyberattacks, or to engage in autonomous replication or automated AI research and development. The policies also outline commitments to conduct model evaluations assessing whether models are approaching capability thresholds that could enable severe or catastrophic harm.
When these capability thresholds are approached, the policies also prescribe _model weight security_ and _model deployment mitigations_ in response. For models with more concerning capabilities, we noted that in each policy developers commit to securing model weights in order to prevent theft by increasingly sophisticated adversaries. Additionally, they commit to implementing deployment safety measures that would significantly reduce the risk of dangerous AI capabilities being misused and causing serious harm.
The policies also establish _conditions for halting development and deployment_ if the developer’s mitigations are insufficient to manage the risks. To manage these risks effectively, the policies include evaluations designed to _elicit the full capabilities of the model,_ as well as policies defining the _timing and frequency of evaluations_ – which is typically before deployment, during training, and after deployment. Furthermore, all three policies express
... (truncated, 98 KB total)30b9f5e826260d9d | Stable ID: ZTYyZGVlNz