Skip to content
Longterm Wiki

METR Time Horizons - Epoch AI

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Epoch AI

Metadata

Cited by 1 page

PageTypeQuality
Eval Saturation & The Evals GapApproach65.0

Cached Content Preview

HTTP 200Fetched Apr 30, 20268 KB
# METR Time Horizons

Durations of the longest task that models can complete correctly more often than not, across a set of software engineering and related tasks.

![box icon](https://epoch.ai/assets/icons/benchmark-box.svg)47 models evaluated models

[Software engineering](https://epoch.ai/benchmarks/search?domain=Software+engineering) [Long context](https://epoch.ai/benchmarks/search?domain=Long+context)

Score vs Release DateScore vs Training Compute (FLOP)Score vs ECILeaderboard

GraphTable

Epoch AI internal runsExternal runs

Airtable - \[Public\] Epoch AI Benchmark and Model Database

Drag to adjust frozen columns

### Alert

Lorem ipsum

Okay

![](https://airtable.com/internal/page_view?isInitialPageLoad=true&pageLoadId=pglcntJlIM1UnNASF&applicationId=appsyxA7qAp1bvsrl&shareModelId=appsyxA7qAp1bvsrl&isEmbedded=true&location=https%3A%2F%2Fairtable.com%2Fembed%2FappsyxA7qAp1bvsrl%2FshroAmbNkK89Jq02c%2FtblyjKGBmFS5khLdW%2FviwvuE5MiSv6wcyeW&referrer=https%3A%2F%2Fepoch.ai%2F)

Airtable - External Benchmarks

Drag to adjust frozen columns

### Alert

Lorem ipsum

Okay

![](https://airtable.com/internal/page_view?isInitialPageLoad=true&pageLoadId=pgl34UK4EYTbnP2dV&applicationId=appbgG0J7K21T8w85&shareModelId=appbgG0J7K21T8w85&isEmbedded=true&location=https%3A%2F%2Fairtable.com%2Fembed%2FappbgG0J7K21T8w85%2FshrfWGp4pboc5qFIq%2Ftbl8409oKU381CZn7%2Fviwu7M8SjH8vRmTbN&referrer=https%3A%2F%2Fepoch.ai%2F)

Time horizon on METR tasks

47 Results

Release date

Organization

OpenAI

Anthropic

Google

Other

Circles sized by


Customize graph

2019

2019

2026

2026

[CC-BY](https://creativecommons.org/licenses/by/4.0/)

ShareDownload graph

### Graph settings

Show data as

All modelsFrontier trend

All modelsFrontier trend

Metrics

X Axis

Release dateTraining compute (FLOP)

Release date

Release date

Training compute (FLOP)

Categorize

Country

Organization

Accessibility

The data shown for this benchmark does not come from Epoch AI internal runs: it is sourced from the [METR Time Horizons leaderboard](https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/)

Customize map

### Map settings

![Show sidebar](https://epoch.ai/assets/icons/show-sidebar.svg)

**Settings**

Frontier trend only

Explore by

None

Country

Organization

Accessibility

## Filter

Filter by text

Apply

GraphTable

Epoch AI internal runsExternal runs

Airtable - \[Public\] Epoch AI Benchmark and Model Database

Drag to adjust frozen columns

### Alert

Lorem ipsum

Okay

![](https://airtable.com/internal/page_view?isInitialPageLoad=true&pageLoadId=pglUHodLHsRg4au5d&applicationId=appsyxA7qAp1bvsrl&shareModelId=appsyxA7qAp1bvsrl&isEmbedded=true&location=https%3A%2F%2Fairtable.com%2Fembed%2FappsyxA7qAp1bvsrl%2FshroAmbNkK89Jq02c%2FtblyjKGBmFS5khLdW%2FviwvuE5MiSv6wcyeW&referrer=https%3A%2F%2Fepoch.ai%2F)

Airtable - External Benchmarks

Drag to adjust frozen columns

### Alert

Lorem ipsum

Okay

![](https://airtable.com/internal/page_view?isInitialPage

... (truncated, 8 KB total)
Resource ID: 5205868f6f7f3d48 | Stable ID: sid_V0nB4P0CTQ