Skip to content
Longterm Wiki
Back

Society of Actuaries Forecasting & Futurism Newsletter

web

A practitioner-oriented 2015 piece from an actuarial newsletter; relevant to AI safety discussions about forecasting methodology and epistemic calibration, but not directly focused on AI systems or safety research.

Metadata

Importance: 28/100organizational reporteducational

Summary

This article by Mary Pat Campbell introduces the Good Judgment Project (GJP), explaining how it emerged from U.S. intelligence community failures and the IARPA ACE forecasting tournament. It argues that simple crowd wisdom is insufficient for accurate prediction, and that weighted aggregation methods—identifying and leveraging 'super forecasters' with strong track records—significantly outperform unweighted averages and complex nonlinear algorithms.

Key Points

  • Simple 'wisdom of crowds' unweighted averaging often fails; crowd intelligence is not automatic or magic.
  • GJP identifies high-performing 'super forecasters' and weights their predictions more heavily to improve accuracy.
  • Collaborative teams of super forecasters proved highly effective in GJP's Season 3.
  • A bias-correction statistical model that adjusts for systematic forecaster judgment errors outperforms both unweighted averages and complex nonlinear algorithms.
  • GJP originated from IARPA's ACE tournament, designed to address repeated intelligence community forecasting failures.

Cited by 1 page

PageTypeQuality
Good Judgment (Forecasting)Organization50.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202621 KB
Article from
Forecasting and Futurism
Month Year July 2015
Issue Number 11

-- 1 of 7 --

JULY 2015 FORECASTING & FUTURISM | 17
who the most accurate forecasters are and leaning more
heavily on them. I couldn’t do this in Steve’s single-
shot contest, but GJP gets to see forecasters’ track re-
cords on large numbers of questions and has been us-
ing them to great effect. In the recently-ended Season
3, GJP’s “super forecasters” were grouped into teams
and encouraged to collaborate, and this approach has
proved very effective. In a paper published this spring,
GJP has also shown that they can do well with non-
linear aggregations derived from a simple statistical
model that adjusts for systematic bias in forecasters’
judgments. Team GJP’s bias-correction model beats
not only the unweighted average but also a number of
widely-used and more complex nonlinear algorithms.
What is this “Good Judgment Project” and who are their
forecasters?
Jay’s post happens to have been written at the end of their
third season, and I’ve joined the GJP for the 4th season.
While there are details of the current season I can’t share,
I can explain the background of the project, some of the
basics of participation, and, most importantly, what I’ve
learned so far.
HISTORY OF THE GOOD
JUDGMENT PROJECT
The Good Judgment Project sprouted out of a number of
surprises in the U.S. intelligence community. How could
they have been blindsided by so many developments?
Part of the research coming out of those failures was a com-
petition called the IARPA ACE tournament. IARPA stands
for Intelligence Advanced Research Projects Activity, pro-
viding funding and running projects that are intended to dig
into intelligence issues that cross multiple organizations
within the U.S. government. According to their own descrip-
tion, IARPA undertakes “high-risk/high-payoff research …
[in which] failures are inevitable. Failure is acceptable so
long as the failure isn’t due to a lack of technical or pro-
grammatic integrity and the results are fully documented.”2
We have often heard of the supposed wisdom of
crowds, and the downfall of experts, but as one
person noted last year, not all crowds are all that
good at predicting:1
“I read the results of my impromptu experiment as a
reminder that crowds are often smart, but they aren’t
magic. Retellings of Galton’s experiment sometimes
make it seem like even pools of poorly informed guess-
ers will automatically produce an accurate estimate,
but, apparently, that’s not true.”
The context of that quote was that the author, Jay Ulfelder,
had a cousin who ran an impromptu contest online, asking
people how many movies he had watched (in the theater) in
the past 13 years. The cousin kept a record of every movie
he watched (to remind himself of the perk of being master
of his own schedule as a freelance writer).
Forty-five people submitted answers, and the average (the
supposed “wisdom of crowds”) was way off from the actual
answer. However, some of the an

... (truncated, 21 KB total)
Resource ID: 1f0af20503ff6885 | Stable ID: M2JiMjY4NW