Frontier Model Forum - Early Best Practices for Frontier AI Safety Evaluations
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Frontier Model Forum
Published by the Frontier Model Forum (industry consortium of major AI labs) in July 2024, this brief represents an early attempt at cross-industry standardization of AI safety evaluation practices and is relevant to governance and evaluation methodology discussions.
Metadata
Summary
The Frontier Model Forum's issue brief outlines preliminary best practices for designing, implementing, and disclosing frontier AI safety evaluations. It emphasizes domain expertise, evaluating full systems rather than just models, and building toward scientific consensus in a field where evaluation metrology remains immature. This is the first in a planned series drawing on interviews and workshops with safety experts across FMF member firms.
Key Points
- •Safety evaluations should draw on domain-specific expertise and detailed threat models, especially for risks outside evaluators' core competencies.
- •Evaluations must assess deployed systems holistically, not just underlying models, since safety interventions are often layered on top of base models.
- •Where scientific consensus is lacking, evaluations should incorporate diverse expert perspectives and transparently discuss methodology trade-offs.
- •The brief is part of a series aimed at standardizing evaluation practices across major frontier AI developers (Anthropic, Google, Microsoft, OpenAI).
- •The document acknowledges AI safety metrology is still immature, framing these as 'early' best practices open to community feedback.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Frontier Model Forum | Organization | 58.0 |
Cached Content Preview
[Skip to content](https://www.frontiermodelforum.org/updates/early-best-practices-for-frontier-ai-safety-evaluations/#wp--skip-link--target)
## Issue Brief: Early Best Practices for Frontier AI Safety Evaluations
By:
Frontier Model Forum
Posted on:
31st July 2024
Frontier AI holds enormous promise for society. From renewable energy to personalized medicine, the most advanced AI models and systems have the potential to power breakthroughs that benefit everyone. Yet they also have the potential to exacerbate societal harms, and introduce or elevate threats to public safety. Evaluating the safety of frontier AI is thus essential for its responsible development and deployment.
Designing and implementing frontier AI safety evaluations can be challenging. Key questions about what to evaluate, how to evaluate it, and how to analyze the results are rarely straightforward. Further, since the metrology of AI safety is still relatively immature, there is little scientific consensus for researchers to draw on when considering how best to evaluate particular safety concerns. Despite those challenges, AI safety researchers and practitioners have nonetheless started to align on some early best practices for frontier AI safety evaluations.
This issue brief is the first in a series of publications that will aim to document those best practices across the member firms of the Frontier Model Forum. Based on interviews and workshops with safety experts from across the member firms of the FMF, the series will focus on key practices that are common to the design, implementation, interpretation, and disclosure of frontier AI safety evaluations regardless of risk domain. Where possible, the series will also reflect input and feedback from the external AI safety research community.
As a starting point, we outline several high-level best practices below. Drawn from different stages in the evaluation lifecycle, the practices are not meant to be exhaustive, but instead to offer preliminary thinking across the design, implementation, and disclosure of frontier AI safety evaluations. We hope they serve as a useful resource for broader public discussion about frontier AI safety evaluations. Future briefs and reports will go into greater depth and detail on specific practices and issue areas.
**Early best practices**
We recommend the following general practices related to the design and analysis of AI safety evaluations:
- **Draw on domain expertise** _._ The design and interpretation of a given AI safety evaluation should be grounded in domain-specific expertise. Evaluations that are based on either mis-specified or under-specified understandings of a particular kind of risk will not be as effective as those that are rooted in detailed threat models and/or deep domain knowledge and scientific understanding of the risk domain. AI evaluation practitioners should seek out the advice of subject matter experts for risks that lie outside their areas of expertise throug
... (truncated, 10 KB total)61c17c727fefcc2e | Stable ID: MzhhNTQ3MW