Big Three Authors Predictions Analysis (Arb Research)

web

arbresearch.com·arbresearch.com/files/big_three.pdf

This Arb Research report evaluates the prediction accuracy of prominent AI figures, offering calibration insights useful for anyone trying to assess expert credibility on AI timelines and safety forecasts. Content is inferred from URL and title as the document was not directly accessible.

Metadata

Importance: 45/100organizational reportanalysis

Summary

Arb Research analyzes predictions made by three prominent AI researchers/authors about AI development timelines and outcomes, evaluating their track records and forecasting accuracy. The analysis likely examines how well these experts' past predictions have held up against actual AI progress, providing calibration data for assessing expert forecasting in AI.

Key Points

•Systematic evaluation of AI prediction track records from three major AI thought leaders
•Provides empirical data on expert forecasting accuracy regarding AI timelines and capabilities
•Useful for calibrating how much weight to assign expert predictions about future AI development
•Highlights patterns of over- or under-estimation in AI progress forecasting
•Contributes to the broader literature on AI forecasting methodology and epistemic humility

Cited by 1 page

Page	Type	Quality
Arb Research	Organization	50.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202619 KB

# A

# LL

# Scoring the Big 3’s predictive performance

Asimov, Clarke and Heinlein as forecasters

![](https://arbresearch.com/files/images/8e0349806f1072d931fe21549ed71ee42bb4145da3ef8e05ebcb75836e8b3b92.jpg)

# Executive Summary

We wanted to know how the old futurists did at forecasting technologies. The client selected the Big Three sci fi writers as exemplars with large nonfiction corpuses, not primarily known for their forecasts, thus not cherry-picked.

We sought a representative sample, so searched systematically using ISFDB, pattern-matching, and crowdsourcing. We checked ${ \\simeq } 1 / 3$ of all their nonfiction.

We tagged and labelled them with a subjective system, but decomposed into clean quantities (correctness, ex-post difficulty, closeness to pure tech). We’re offering a bug bounty for errors. Outputs: Asimov file, Heinlein file, Clarke file.

We introduced a simple score: relevance to tech forecasting $\\times$ ex-post smartness. Asimov is on top by some margin, but all of them average $< 2 0 %$ of the max score. Each of them beat Kurzweil on long-range accuracy.

● We also looked at their “impressiveness / embarrassment”. 3 headline results:

|     |     |     |     |
| --- | --- | --- | --- |
|  | Ratio of Impressive toEmbarrassing Predictions | Strict techaccuracy | Averagesccore2}$ |
| Asimov | 1.9 : 1 | 57% | 22% |
| Heinlein | 1:1 | 36% | 11% |
| Clarke | 0.8 : 1 | 48% | 14% |

I do not for a moment suggest that more than 1 per cent of science-fiction readers would be reliable prophets; but I do suggest that almost a hundred per cent of reliable prophets will be science-fiction readers – or write

– Clarke (1962

# Intro

When they check old predictions at all, people tend to pick out outstanding wins or amusing misses. We were instead tasked with systematically reviewing predictions made by “the Big Three” scifi writers: Arthur C Clarke, Robert Heinlein, and Isaac Asimov.

Systematic, as opposed to exhaustive. (See Karnofsky’s post for the point of all this.)

Target: Relatively precise predictions about the future course of technology, with a resolution date, in their nonfiction work and interviews. This spec rules out $590 %$ of their writing.

The client asked for relevance-to-EA ("How well does this forecast fit into the reference class of forecasting far-off future technologies?") and smartness ("How smart/dumb does this forecast look in hindsight?"). Operationalisations are in “Methodology”, below.

We also wanted to move fast, delivering within 6 weeks. (Later, 7.5 weeks).

# Initial exploration

I (Gavin) went through the first dozen pages of Google results for queries like “Heinlein predictions”, getting mostly fun newspaper or magazine spots. This yielded 200 cherry-picked predictions, and taught me the shape of the problem (for instance, what words are most common in a prediction sentence). I scored them manually, including a crude categorisation of their confidence and so calibration.

# Data Collection

W

... (truncated, 19 KB total)

Resource ID: cecf8fe7bba30f9b | Stable ID: sid_pnw6qAqWz4