Back
Superforecasters and Good Judgement - Built In
webRelevant to AI safety as superforecasting methods are widely used to assess AI risk timelines and governance scenarios; provides accessible background on the Good Judgment Project methodology for readers unfamiliar with calibrated probabilistic forecasting.
Metadata
Importance: 32/100blog posteducational
Summary
An overview of Good Judgment's superforecasting methodology, featuring Philip Tetlock's research and how trained human forecasters using probabilistic reasoning and psychological discipline outperform traditional models, illustrated through COVID-19 predictions. The article explains how superforecasters break complex questions into tractable sub-questions and iteratively update probability estimates as new information emerges.
Key Points
- •Superforecasters assign calibrated probabilities to outcomes and continuously update them as conditions change, rather than making binary predictions.
- •Good Judgment was founded on Philip Tetlock's 1984 forecasting tournaments and gained prominence through IARPA's geopolitical forecasting competitions.
- •Human superforecasters add particular value when modeling complex systems with behavioral/social constraints that pure quantitative models struggle to capture.
- •The methodology emphasizes psychological factors alongside data science, including recognition of cognitive biases and disciplined probabilistic thinking.
- •Superforecasters demonstrated early COVID-19 forecasting success by correctly identifying California death ranges when expert models widely disagreed.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Good Judgment (Forecasting) | Organization | 50.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202616 KB

The so-called “superforecasters” started contemplating how COVID-19 might unfold across the United States back in early April. It was still the early days of the pandemic in the States, with quite a bit of [disagreement](https://projects.fivethirtyeight.com/covid-forecasts/) among the various models, and data so messy that [some](https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make-a-good-covid-19-model/) simply chose not to predict. But messy-data, high-interest questions are precisely what superforecasters aim to clarify.
To properly tackle the overall picture, they broke it down into several questions. Taken in isolation, any single one might sound like a morbid parlor game, but the goal was to better understand the situation, not glibly speculate.
“Folks were all over the map on what it was that had hit us hard a month earlier,” said Marc Koehler, vice president of Good Judgment and a superforecaster himself.
One of the questions they considered in the difficult process: How many people in California will have died from COVID-19 by the end of June?
Like all non-yes/no questions that superforecasters consider, the question was framed in a multiple choice format, with each of the five options in this instance listed as a numerical range. They chose the option that ranged between 3,900 and 19,000 deaths.
But superforecasters — the cream of the crop of predictors affiliated with the Good Judgement project — don’t simply vote yes or no, they assign probabilities, then adjust them as time goes on and variables change. Before April was over, the group had assigned their range a 50 percent probability. They were already confident in their selection early on.
“And \[the probability\] just went up from there,” Koehler said. Unfortunately, the high-end estimate was spot-on. There were 6,082 COVID-19 deaths in California by June 30, according to Johns Hopkins University’s Coronavirus Resource Center.
Sure, a range of 15,000-plus isn’t exactly hyper-specific, and only five possible choices is pretty good odds. And unless you’ve assigned a probability of 100 percent, an outcome in itself doesn’t prove or disprove a forecast’s quality, as post-2016 Nate Silver is surely very tired of explaining.
But especially given the high degree of uncertainty at the time, Koehler pinpoints the prediction as a recent example of superforecaster success, if one we all wish they had overestimated. The example also gets at Good Judgment’s approach of giving equal, if not greater footing, to psychology alongside the [data science](https://builtin.com/data-science) that traditionally drives predictive analytics.
Here’s Koehler’s working hypothesis: “Modeling is a very good way to explain how a virus will move through an unconstrained herd. But when you begin to pu
... (truncated, 16 KB total)Resource ID:
480cc530f7acf8c6 | Stable ID: OTI2YTc4Nj