Frontier Model Forum - Early Best Practices for Frontier AI Safety Evaluations

web

Frontier Model Forum·frontiermodelforum.org/updates/early-best-practices-for-f...

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Frontier Model Forum

Published by the Frontier Model Forum (industry consortium of major AI labs) in July 2024, this brief represents an early attempt at cross-industry standardization of AI safety evaluation practices and is relevant to governance and evaluation methodology discussions.

Metadata

Importance: 62/100organizational reportreference

Summary

The Frontier Model Forum's issue brief outlines preliminary best practices for designing, implementing, and disclosing frontier AI safety evaluations. It emphasizes domain expertise, evaluating full systems rather than just models, and building toward scientific consensus in a field where evaluation metrology remains immature. This is the first in a planned series drawing on interviews and workshops with safety experts across FMF member firms.

Key Points

•Safety evaluations should draw on domain-specific expertise and detailed threat models, especially for risks outside evaluators' core competencies.
•Evaluations must assess deployed systems holistically, not just underlying models, since safety interventions are often layered on top of base models.
•Where scientific consensus is lacking, evaluations should incorporate diverse expert perspectives and transparently discuss methodology trade-offs.
•The brief is part of a series aimed at standardizing evaluation practices across major frontier AI developers (Anthropic, Google, Microsoft, OpenAI).
•The document acknowledges AI safety metrology is still immature, framing these as 'early' best practices open to community feedback.

Cited by 1 page

Page	Type	Quality
Frontier Model Forum	Organization	58.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202610 KB

Issue Brief: Early Best Practices for Frontier AI Safety Evaluations - Frontier Model Forum 
 
 
 
 
 
 

 

 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 
 
 

 

 

 
 
 
 Issue Brief: Early Best Practices for Frontier AI Safety Evaluations

 
 
 By: 

 Frontier Model Forum

 

 
 Posted on: 

 31st July 2024 
 

 
 
 
 

 
 Frontier AI holds enormous promise for society. From renewable energy to personalized medicine, the most advanced AI models and systems have the potential to power breakthroughs that benefit everyone. Yet they also have the potential to exacerbate societal harms, and introduce or elevate threats to public safety. Evaluating the safety of frontier AI is thus essential for its responsible development and deployment. 

 Designing and implementing frontier AI safety evaluations can be challenging. Key questions about what to evaluate, how to evaluate it, and how to analyze the results are rarely straightforward. Further, since the metrology of AI safety is still relatively immature, there is little scientific consensus for researchers to draw on when considering how best to evaluate particular safety concerns. Despite those challenges, AI safety researchers and practitioners have nonetheless started to align on some early best practices for frontier AI safety evaluations.

 This issue brief is the first in a series of publications that will aim to document those best practices across the member firms of the Frontier Model Forum. Based on interviews and workshops with safety experts from across the member firms of the FMF, the series will focus on key practices that are common to the design, implementation, interpretation, and disclosure of frontier AI safety evaluations regardless of risk domain. Where possible, the series will also reflect input and feedback from the external AI safety research community. 

 As a starting point, we outline several high-level best practices below. Drawn from different stages in the evaluation lifecycle, the practices are not meant to be exhaustive, but instead to offer preliminary thinking across the design, implementation, and disclosure of frontier AI safety evaluations. We hope they serve as a useful resource for broader public discussion about frontier AI safety evaluations. Future briefs and reports will go into greater depth and detail on specific practices and issue areas. 

 Early best practices 

 We recommend the following general practices related to the design and analysis of AI safety evaluations: 

 
 Draw on domain expertise . The design and interpretation of a given AI safety evaluation should be grounded in domain-specific expertise. Evaluations that are based on either mis-specified or under-specified understandings of a particular kind of risk will not be as effective as those that are rooted in detailed threat models and/or deep domain knowledge and scientific understanding of the risk domain.  AI evaluation practitioners should seek out the advice of subject matter experts for risks tha

... (truncated, 10 KB total)

Resource ID: 61c17c727fefcc2e | Stable ID: sid_Rt8fYXR9SO