Evidence on good forecasting practices from the Good Judgment Project - AI Impacts
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: AI Impacts
Relevant to AI safety researchers interested in forecasting AI timelines or risks, as it provides empirical grounding for best practices in probabilistic prediction used in tools like Metaculus and forecasting-based AI risk assessments.
Metadata
Summary
Summarizes empirical findings from the Good Judgment Project (GJP), the winning team in IARPA's 2011-2015 forecasting tournament, on what factors correlate with accurate probabilistic forecasting. Key predictors include past performance, prediction frequency, deliberation time, team collaboration, and cognitive traits like active open-mindedness. Based on Philip Tetlock's research and the Superforecasting methodology.
Key Points
- •Past performance is the strongest predictor of forecasting accuracy, with ~70% of superforecasters maintaining their status year-to-year and a 0.65 year-to-year correlation across all forecasters.
- •Behavioral factors like deliberation time, team collaboration, and active open-mindedness correlate meaningfully with Brier score improvements.
- •A one-hour training module on forecasting techniques measurably improved accuracy, suggesting forecasting skill is learnable.
- •Use of structured approaches like 'the outside view,' Fermi estimation, and Bayesian reasoning are associated with better forecasting outcomes.
- •Intelligence and domain expertise matter but are less important than behavioral and process variables like making more predictions and updating frequently.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Good Judgment (Forecasting) | Organization | 50.0 |
| Philip Tetlock | Person | 73.0 |
Cached Content Preview
According to experience and data from the Good Judgment Project, the following are associated with successful forecasting, in rough decreasing order of combined importance and confidence:
- Past performance in the same broad domain
- Making more predictions on the same question
- Deliberation time
- Collaboration on teams
- Intelligence
- Domain expertise
- Having taken a one-hour training module on these topics
- ‘Cognitive reflection’ test scores
- ‘Active open-mindedness’
- Aggregation of individual judgments
- Use of precise probabilistic predictions
- Use of ‘the outside view’
- ‘Fermi-izing’
- ‘Bayesian reasoning’
- Practice
Contents
## **Details**
### **1\. 1. Process**
The Good Judgment Project (GJP) was the winning team in IARPA’s 2011-2015 forecasting tournament. In the tournament, six teams assigned probabilistic answers to hundreds of questions about geopolitical events months to a year in the future. Each competing team used a different method for coming up with their guesses, so the tournament helps us to evaluate different forecasting methods.
The GJP team, led by Philip Tetlock and Barbara Mellers, gathered thousands of online volunteers and had them answer the tournament questions. They then made their official forecasts by aggregating these answers. In the process, the team collected data about the patterns of performance in their volunteers, and experimented with aggregation methods and improvement interventions. For example, they ran an RCT to test the effect of a short training program on forecasting accuracy. They especially focused on identifying and making use of the most successful two percent of forecasters, dubbed ‘superforecasters’.
Tetlock’s book _Superforecasting_ describes this process and Tetlock’s resulting understanding of how to forecast well.
### **1.2. Correlates of successful forecasting**
#### 1.2.1. Past performance
Roughly 70% of the superforecasters maintained their status from one year to the next [1](https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project/#easy-footnote-bottom-1-1283 ""). Across all the forecasters, the correlation between performance in one year and performance in the next year was 0.65 [2](https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project/#easy-footnote-bottom-2-1283 ""). These high correlations are particularly impressive because the forecasters were online volunteers; presumably substantial variance year-to-year came from forecasters throttling down their engagement due to fatigue or changing life circumstances [3](https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project/#easy-footnote-bottom-3-1283 "").
#### 1.2.2. Behavioral and dispositional variables
Table 2 depicts the correlations between measured variables amongst GJP’s volunteers in the first two years of the tournament [4](https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-jud
... (truncated, 34 KB total)303d1e17bc4df5ee | Stable ID: ODYxMDJhYT