METR Time Horizons - Epoch AI
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Epoch AI
Metadata
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Eval Saturation & The Evals Gap | Approach | 65.0 |
Cached Content Preview
HTTP 200Fetched Apr 30, 20268 KB
# METR Time Horizons
Durations of the longest task that models can complete correctly more often than not, across a set of software engineering and related tasks.
47 models evaluated models
[Software engineering](https://epoch.ai/benchmarks/search?domain=Software+engineering) [Long context](https://epoch.ai/benchmarks/search?domain=Long+context)
Score vs Release DateScore vs Training Compute (FLOP)Score vs ECILeaderboard
GraphTable
Epoch AI internal runsExternal runs
Airtable - \[Public\] Epoch AI Benchmark and Model Database
Drag to adjust frozen columns
### Alert
Lorem ipsum
Okay

Airtable - External Benchmarks
Drag to adjust frozen columns
### Alert
Lorem ipsum
Okay

Time horizon on METR tasks
47 Results
Release date
Organization
OpenAI
Anthropic
Google
Other
Circles sized by
Customize graph
2019
2019
2026
2026
[CC-BY](https://creativecommons.org/licenses/by/4.0/)
ShareDownload graph
### Graph settings
Show data as
All modelsFrontier trend
All modelsFrontier trend
Metrics
X Axis
Release dateTraining compute (FLOP)
Release date
Release date
Training compute (FLOP)
Categorize
Country
Organization
Accessibility
The data shown for this benchmark does not come from Epoch AI internal runs: it is sourced from the [METR Time Horizons leaderboard](https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/)
Customize map
### Map settings

**Settings**
Frontier trend only
Explore by
None
Country
Organization
Accessibility
## Filter
Filter by text
Apply
GraphTable
Epoch AI internal runsExternal runs
Airtable - \[Public\] Epoch AI Benchmark and Model Database
Drag to adjust frozen columns
### Alert
Lorem ipsum
Okay

Airtable - External Benchmarks
Drag to adjust frozen columns
### Alert
Lorem ipsum
Okay
Resource ID:
5205868f6f7f3d48 | Stable ID: sid_V0nB4P0CTQ