2024 01 11 Dangerous Capability Evaluations
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: METR
Published by METR (Model Evaluation & Threat Research), this post is a foundational reference for understanding how dangerous capability evaluations work in practice and is directly relevant to frontier lab safety commitments and government AI policy discussions.
Metadata
Summary
METR (formerly ARC Evals) describes their framework for evaluating potentially dangerous capabilities in frontier AI models, including autonomous replication, acquiring resources, and assisting with weapons development. The post outlines their methodology for assessing whether models pose catastrophic risks and how these evaluations inform deployment decisions. It represents a key practical approach to pre-deployment safety testing.
Key Points
- •Introduces structured evaluations for dangerous capabilities including autonomous replication, resource acquisition, and CBRN weapons assistance
- •Describes METR's role in conducting third-party capability evaluations for frontier AI labs before model deployment
- •Outlines specific task suites and threat models used to probe whether models could meaningfully assist in catastrophic harm
- •Emphasizes the challenge of setting meaningful thresholds: determining when a capability level is dangerous enough to block deployment
- •Positions dangerous capability evals as a minimal but critical safety check, not a comprehensive alignment solution
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Safety Intervention Effectiveness Matrix | Analysis | 73.0 |
Cached Content Preview
[](https://metr.org/) - [Research](https://metr.org/research) - [Notes](https://metr.org/notes) - [Updates](https://metr.org/blog) - [About](https://metr.org/about) - [Donate](https://metr.org/donate) - [Careers](https://metr.org/careers) Menu # Page not found
dfeaf87817e20677 | Stable ID: YWRkZDdlNW