Back
ForecastBench: Dynamic LLM Forecasting Benchmark
webforecastbench.org·forecastbench.org
ForecastBench is a dynamic benchmark measuring LLM forecasting accuracy against human baselines, relevant to AI safety as forecasting ability serves as a proxy for general intelligence and helps track AI capability progress toward and beyond human-level performance.
Metadata
Importance: 62/100tool pagetool
Summary
ForecastBench is a contamination-free benchmark that evaluates LLM forecasting accuracy against human comparison groups, including superforecasters. It maintains both a baseline leaderboard (no tools) and a tournament leaderboard (with scaffolding/tools), and projects when LLMs will reach superforecaster-level performance.
Key Points
- •Dynamic, contamination-free benchmark preventing LLMs from training on benchmark questions, ensuring valid capability measurement.
- •Compares LLM forecasting performance against human baselines including superforecasters as a proxy for general intelligence.
- •Dual leaderboards: baseline (raw model performance) and tournament (with tool use, fine-tuning, ensembling).
- •Tracks historical progress in LLM forecasting capabilities and projects date of LLM-superforecaster parity.
- •Open to public submissions, enabling broad participation in capability evaluation.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Forecasting Research Institute (FRI) | Organization | 55.0 |
| ForecastBench | Project | 53.0 |
1 FactBase fact citing this source
| Entity | Property | Value | As Of |
|---|---|---|---|
| ForecastBench | Founded Date | Sep 2024 | — |
Cached Content Preview
HTTP 200Fetched Apr 7, 20261 KB
ForecastBench ForecastBench A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence. Featured Scoring with the Brier Index Mar 4, 2026 Tournament leaderboard Tracks frontier accuracy by allowing tool use to improve LLM performance. Models can be scaffolded, fine-tuned, ensembled, and so on. Open to public submissions . Tournament leaderboard Baseline leaderboard Tracks base model LLM forecasting performance without additional tools , comparing against human baselines and showing consistent progress in capabilities since models were first tested. Baseline leaderboard Projected LLM-superforecaster parity Explore how LLM forecasting accuracy evolves on ForecastBench. A linear trend projects the date when LLMs reach superforecaster-level performance. Explore chart
Resource ID:
kb-c808dd961e2e3c1d