AIA Forecaster: Technical Report - arXiv Abstract

paper

2025·arXiv·arxiv.org/abs/2511.07678

Authors

Rohan Alur·Bradly C. Stadie·Daniel Kang·Ryan Chen·Matt McManus·Michael Rickert·Tyler Lee·Michael Federici·Richard Zhu·Dennis Fogerty·Hayley Williamson·Nina Lozinski·Aaron Linsky·Jasjeet S. Sekhon

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Describes an LLM-based forecasting system that combines agentic search with calibration techniques to mitigate biases, demonstrating AI systems can match human expert performance on structured prediction tasks—relevant for understanding LLM capabilities and limitations.

Paper Details

Citations

0 influential

Year

2025

arXiv:2511.07678 DOI:10.48550/arXiv.2511.07678 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

This technical report describes the AIA Forecaster, a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three core elements: agentic search over high-quality news sources, a supervisor agent that reconciles disparate forecasts for the same event, and a set of statistical calibration techniques to counter behavioral biases in large language models. On the ForecastBench benchmark (Karger et al., 2024), the AIA Forecaster achieves performance equal to human superforecasters, surpassing prior LLM baselines. In addition to reporting on ForecastBench, we also introduce a more challenging forecasting benchmark sourced from liquid prediction markets. While the AIA Forecaster underperforms market consensus on this benchmark, an ensemble combining AIA Forecaster with market consensus outperforms consensus alone, demonstrating that our forecaster provides additive information. Our work establishes a new state of the art in AI forecasting and provides practical, transferable recommendations for future research. To the best of our knowledge, this is the first work that verifiably achieves expert-level forecasting at scale.

Summary

The AIA Forecaster is an LLM-based system for judgmental forecasting that combines agentic search over news sources, a supervisor agent for reconciling forecasts, and statistical calibration techniques to mitigate LLM biases. On the ForecastBench benchmark, it achieves performance equal to human superforecasters and outperforms prior LLM baselines. While underperforming market consensus on a more challenging prediction market benchmark, an ensemble combining AIA Forecaster with market consensus outperforms consensus alone, demonstrating the system provides additive predictive value and represents the first verified expert-level AI forecasting at scale.

Cited by 1 page

Page	Type	Quality
Bridgewater AIA Labs	Organization	66.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202694 KB

[2511.07678] AIA Forecaster: Technical Report 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 AIA Forecaster: Technical Report

 
 
 Rohan Alur ∗ ,
Bradly C. Stadie ∗ ,
Daniel Kang, Ryan Chen, Matt McManus, Michael Rickert, Tyler Lee, Michael Federici, Richard Zhu, Dennis Fogerty, Hayley Williamson, Nina Lozinski, Aaron Linsky, Jasjeet S. Sekhon 
 
 \addr Bridgewater AIA Labs 
 New York, NY
 
 

 
 Abstract

 This technical report describes the AIA Forecaster , a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three core elements: agentic search over high-quality news sources, a supervisor agent that reconciles disparate forecasts for the same event, and a set of statistical calibration techniques to counter behavioral biases in large language models. On the ForecastBench benchmark ( karger2024forecastbench ) , the AIA Forecaster achieves performance equal to human superforecasters, surpassing prior LLM baselines. In addition to reporting on ForecastBench , we also introduce a more challenging forecasting benchmark sourced from liquid prediction markets. While the AIA Forecaster underperforms market consensus on this benchmark, an ensemble combining AIA Forecaster with market consensus outperforms consensus alone, demonstrating that our forecaster provides additive information. Our work establishes a new state of the art in AI forecasting and provides practical, transferable recommendations for future research. To the best of our knowledge, this is the first work that verifiably achieves expert-level forecasting at scale.

 
 0 0 footnotetext: These authors contributed equally to this work. Correspondence to aialabs@bwater.com. 
 
 
 Contents 

 
 
 
 
 1 Introduction

 
 Forecasting is a universal problem. Any system that makes informed judgments about what may happen in the future relies on at least some degree of forecasting. In agriculture, this takes the form of predicting future crop yields and food system resilience ( bueechi2023crop ; tanaka2023satellite ; paudel2021machine ) . In biotechnology and medicine, clinical trials are carefully designed to help researchers predict the impact of newly developed treatments ( qian2025enhancing ; curth2024using ) . Elsewhere in biology, AlphaFold can be viewed as a form of forecasting protein folding structure under uncertainty ( jumper2021highly ) . In academia, grant funding is a form of forecasting which scientific discoveries will be made conditional on funding ( tohalino2022predicting ) . Climate and environmental science models local and long-term weather trends, which are of universal importance ( soliman2024deepmind ; price2025probabilistic ; deepmind2024weathernext ) . In political science, forecasting election outcomes and the impact of policy decisions are an essential component of political calculus ( jennings2020election ; wang2015forecasting ; gelman2020information ) . Militaries make use of forecasting for thre

... (truncated, 94 KB total)

Resource ID: 36ddd2f43e74d56c | Stable ID: sid_DO41AySCnt