Skip to content
Longterm Wiki
Search
Entities
Research
Policy
Sources
FactBase
About
Internal
Search
⌘K
Benchmarks
/
RE-Bench
RE-Bench
Agentic
Wiki page
Data
Research Engineering Benchmark from METR — evaluates AI agents on 7 challenging ML research engineering tasks requiring multi-step problem solving over extended time horizons.
Models Tested
0
Scoring:
percentage
Introduced:
2024-11
Maintainer:
METR
No model scores recorded for this benchmark yet.