AIA Forecaster: Expert-Level LLM-Based Judgmental Forecasting

paper

2025·arXiv·arxiv.org/html/2511.07678v1

Authors

Rohan Alur·Bradly C. Stadie·Daniel Kang·Ryan Chen·Matt McManus·Michael Rickert·Tyler Lee·Michael Federici·Richard Zhu·Dennis Fogerty·Hayley Williamson·Nina Lozinski·Aaron Linsky·Jasjeet S. Sekhon

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Relevant to AI safety discussions around increasingly capable AI systems matching or exceeding human expert judgment in forecasting tasks, with implications for AI-assisted decision-making and risk assessment.

Paper Details

Citations

0 influential

Year

2025

arXiv:2511.07678 DOI:10.48550/arXiv.2511.07678 Semantic Scholar

Metadata

Importance: 55/100arxiv preprintprimary source

Abstract

This technical report describes the AIA Forecaster, a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three core elements: agentic search over high-quality news sources, a supervisor agent that reconciles disparate forecasts for the same event, and a set of statistical calibration techniques to counter behavioral biases in large language models. On the ForecastBench benchmark (Karger et al., 2024), the AIA Forecaster achieves performance equal to human superforecasters, surpassing prior LLM baselines. In addition to reporting on ForecastBench, we also introduce a more challenging forecasting benchmark sourced from liquid prediction markets. While the AIA Forecaster underperforms market consensus on this benchmark, an ensemble combining AIA Forecaster with market consensus outperforms consensus alone, demonstrating that our forecaster provides additive information. Our work establishes a new state of the art in AI forecasting and provides practical, transferable recommendations for future research. To the best of our knowledge, this is the first work that verifiably achieves expert-level forecasting at scale.

Summary

The AIA Forecaster is an LLM-based forecasting system combining agentic news search, a supervisor agent for forecast reconciliation, and statistical calibration to reduce LLM biases. It matches human superforecaster performance on the ForecastBench benchmark, representing the first verified expert-level AI forecasting at scale. An ensemble with market consensus further outperforms consensus alone, showing complementary predictive value.

Key Points

•Achieves performance equal to human superforecasters on ForecastBench, surpassing prior LLM baselines and establishing a new state of the art in AI forecasting.
•Combines three core elements: agentic search over news sources, a supervisor agent reconciling disparate forecasts, and statistical calibration to mitigate LLM behavioral biases.
•Introduces a new, harder forecasting benchmark sourced from liquid prediction markets, where it underperforms market consensus alone.
•Ensemble of AIA Forecaster + market consensus outperforms consensus alone, demonstrating the system provides additive, complementary predictive signal.
•Claims to be the first work verifiably achieving expert-level forecasting at scale, with practical recommendations transferable to future research.

Resource ID: fde75aac1421b2b6 | Stable ID: NWEwZGVjY2