Frontier Model Forum - Issue Brief: Preliminary Taxonomy of AI-Bio Safety Evaluations
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Frontier Model Forum
Published by the Frontier Model Forum (a coalition of major AI labs including Google, Microsoft, OpenAI, and Anthropic), this brief is a practitioner-facing policy document aimed at harmonizing biosecurity evaluation practices across frontier AI developers.
Metadata
Summary
This Frontier Model Forum issue brief proposes a structured taxonomy for evaluating AI systems' potential to assist with biological threats. It categorizes different types of biosecurity-relevant AI evaluations to help developers and policymakers assess and mitigate misuse risks from frontier models in the bio domain.
Key Points
- •Introduces a preliminary framework for classifying AI bio-safety evaluations across different threat vectors and capability levels
- •Aims to standardize how frontier AI labs assess whether models could provide meaningful 'uplift' to bad actors seeking to cause biological harm
- •Distinguishes between different categories of bio-relevant AI capabilities such as synthesis routes, pathogen enhancement, and weaponization knowledge
- •Supports coordination among leading AI labs and policymakers on consistent evaluation methodologies for biosecurity risks
- •Part of broader Frontier Model Forum efforts to develop shared safety standards and red-teaming practices across the industry
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Frontier Model Forum | Organization | 58.0 |
Cached Content Preview
[Skip to content](https://www.frontiermodelforum.org/updates/issue-brief-preliminary-taxonomy-of-ai-bio-safety-evaluations/#wp--skip-link--target)
## Issue Brief: Preliminary Taxonomy of AI-Bio Safety Evaluations
By:
Frontier Model Forum
Posted on:
20th December 2024
Frontier AI-bio safety evaluations aim to test the biological capabilities and, by extension, the potential biosafety implications of frontier AI. As the science of AI safety evaluations is still nascent, the evaluations themselves can vary widely in both purpose and methodology. As such, a key first step in building out an effective safety evaluation ecosystem for the AI-bio space is developing a shared understanding of both the function and type of safety evaluations.
This issue brief offers an initial taxonomy and definitions for frontier AI safety evaluations specific to the biological domain, categorized across two dimensions: methodology and domain. Based on input from FMF member firm experts, in addition to a diverse group of external experts from the advanced AI and biological research fields, this brief aims to document and build a preliminary consensus around the current understanding of frontier AI-bio safety evaluations.
**Evaluation Methods**
The first dimension by which AI-bio safety evaluations are categorized is the **methodology**. Evaluation methodology describes _how_ the frontier AI model or system is being evaluated, or the study design.
While evaluation studies may incorporate more than one of these methods, most existing evaluation tasks include one of three main methods. For evaluations of AI models or systems themselves, two common methods include:
- **Benchmark Evaluations**: Sets of safety-relevant questions or tasks designed to test model capabilities and assess how answers differ across models. These evaluations aim to provide baseline indications of general or domain-specific capabilities that are comparable across models. Benchmarks are designed to be easily repeatable and are typically automated, though grading can also incorporate expert human grading. In the biological domain, benchmarks may include knowledge benchmarks (e.g., multiple choice QA, open ended questions), capability benchmarks (e.g., agentic tests), or safeguard evaluations (e.g., refusals testing for harmful queries).
- **Red-Team Exercises**: dynamic, adversarial, and interactive evaluations meant to elicit specific information about the harmful capabilities of a particular model or system, often by simulating a potential attack or form of deliberate misuse and then measuring for residual risk. Although automated red-teaming exercises are under development, they are generally carried out by human actors, including red-teaming experts, where a key element is the dynamic nature of interaction between the human experts and the model. Red-teaming exercises can further be distinguished from benchmark evaluations by their emphasis on assessing the effectiveness of existing safeg
... (truncated, 14 KB total)ae2092b70b0dd283 | Stable ID: NTBhYzA5OW