Evaluation methodology

web

METR·metr.org/blog/2024-03-11-autonomy-evaluation/

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: METR

This URL points to a now-inaccessible METR blog post on autonomy evaluation methodology; METR's evaluation frameworks are widely referenced in AI safety policy and responsible scaling discussions, so archived versions may be more useful.

Metadata

Importance: 55/100blog postreference

Summary

This page from METR (Model Evaluation and Threat Research) appears to be inaccessible (404 not found), but was intended to describe their methodology for evaluating autonomous AI capabilities. METR is known for developing evaluations to assess whether AI models possess dangerous levels of autonomy that could pose safety risks.

Key Points

•Page returned a 404 error, meaning the original content is no longer accessible at this URL.
•METR specializes in evaluating AI systems for autonomous capabilities that could pose existential or catastrophic risks.
•METR's autonomy evaluations are used by major AI labs as part of responsible scaling policies and model cards.
•The methodology likely covered task-based benchmarks assessing AI ability to operate independently over long horizons.

Cited by 1 page

Page	Type	Quality
AI Evaluation	Approach	72.0

Cached Content Preview

HTTP 200Fetched Mar 15, 20260 KB

[![METR Logo](https://metr.org/assets/images/logo/logo.svg)](https://metr.org/)

- [Research](https://metr.org/research)
- [Notes](https://metr.org/notes)
- [Updates](https://metr.org/blog)
- [About](https://metr.org/about)
- [Donate](https://metr.org/donate)
- [Careers](https://metr.org/careers)

Menu

# Page not found

Resource ID: 259ff114f8c6586a | Stable ID: sid_WGKZzJ1nCR