Skip to content
Longterm Wiki
Back

ForecastBench: Dynamic LLM Forecasting Benchmark

web
forecastbench.org·forecastbench.org

ForecastBench is a dynamic benchmark measuring LLM forecasting accuracy against human baselines, relevant to AI safety as forecasting ability serves as a proxy for general intelligence and helps track AI capability progress toward and beyond human-level performance.

Metadata

Importance: 62/100tool pagetool

Summary

ForecastBench is a contamination-free benchmark that evaluates LLM forecasting accuracy against human comparison groups, including superforecasters. It maintains both a baseline leaderboard (no tools) and a tournament leaderboard (with scaffolding/tools), and projects when LLMs will reach superforecaster-level performance.

Key Points

  • Dynamic, contamination-free benchmark preventing LLMs from training on benchmark questions, ensuring valid capability measurement.
  • Compares LLM forecasting performance against human baselines including superforecasters as a proxy for general intelligence.
  • Dual leaderboards: baseline (raw model performance) and tournament (with tool use, fine-tuning, ensembling).
  • Tracks historical progress in LLM forecasting capabilities and projects date of LLM-superforecaster parity.
  • Open to public submissions, enabling broad participation in capability evaluation.

Cited by 2 pages

PageTypeQuality
Forecasting Research Institute (FRI)Organization55.0
ForecastBenchProject53.0

1 FactBase fact citing this source

EntityPropertyValueAs Of
ForecastBenchFounded DateSep 2024

Cached Content Preview

HTTP 200Fetched Apr 7, 20261 KB
ForecastBench 
 

 
 
 
 
 

 

 

 

 

 

 

 
 
 
 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 

 

 

 

 

 
 

 

 
 
 
 
 
 
 

 
 
 

 
 

 
 
 
 
 ForecastBench

 
 

 
 A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence.

 

 
 
 
 
 Featured 
 Scoring with the
Brier Index 
 
 Mar 4, 2026 
 
 
 
 
 

 
 
 
 
 

 
 
 
 
 
 

 
 
 
 
 
 
 Tournament leaderboard

 Tracks frontier accuracy by allowing tool use to improve LLM performance. Models can be scaffolded, fine-tuned, ensembled, and so on. Open to public submissions .

 Tournament leaderboard 

 
 
 
 
 
 
 
 
 

 
 
 
 

 
 
 

 
 
 
 
 
 
 
 
 
 
 Baseline leaderboard

 Tracks base model LLM forecasting performance without additional tools , comparing against human baselines and showing consistent progress in capabilities since models were first tested.

 Baseline leaderboard 

 
 
 
 

 
 
 
 

 
 
 

 
 
 
 
 Projected LLM-superforecaster parity

 
 Explore how LLM forecasting accuracy evolves on ForecastBench. A linear trend projects the date when LLMs reach superforecaster-level performance.

 Explore chart
Resource ID: kb-c808dd961e2e3c1d