Dreber et al. (2015)
webCredibility Rating
Gold standard. Rigorous peer review, high editorial standards, and strong institutional reputation.
Rating inherited from publication venue: PNAS
Relevant to AI safety in that it explores scalable mechanisms for aggregating expert judgment about uncertain outcomes, analogous to forecasting AI risk or evaluating alignment research credibility.
Metadata
Summary
This PNAS paper demonstrates that prediction markets can accurately forecast the outcomes of scientific replications, outperforming individual expert surveys. Applied to 44 psychology studies from the Reproducibility Project, the markets estimated that hypotheses tested in psychology have a median prior probability of only 9% of being true, and that statistically significant results require well-powered replications to achieve high confidence.
Key Points
- •Prediction markets set up for 44 psychology replications accurately predicted replication outcomes and outperformed individual expert forecasts.
- •Psychology hypotheses tested in published studies have a low median prior probability of being true (~9%), suggesting widespread publication bias.
- •A 'statistically significant' result alone is insufficient; well-powered replications are needed to establish high confidence in a hypothesis.
- •Prediction markets offer a scalable, low-cost mechanism for rapidly assessing reproducibility before costly large-scale replications are undertaken.
- •The framework could help prioritize which studies to replicate, optimally allocating limited research resources.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Prediction Markets (AI Forecasting) | Approach | 56.0 |
Cached Content Preview
Contents
## Significance
There is increasing concern about the reproducibility of scientific research. For example, the costs associated with irreproducible preclinical research alone have recently been estimated at US$28 billion a year in the United States. However, there are currently no mechanisms in place to quickly identify findings that are unlikely to replicate. We show that prediction markets are well suited to bridge this gap. Prediction markets set up to estimate the reproducibility of 44 studies published in prominent psychology journals and replicated in The Reproducibility Project: Psychology predict the outcomes of the replications well and outperform a survey of individual forecasts.
## Abstract
Concerns about a lack of reproducibility of statistically significant results have recently been raised in many fields, and it has been argued that this lack comes at substantial economic costs. We here report the results from prediction markets set up to quantify the reproducibility of 44 studies published in prominent psychology journals and replicated in the Reproducibility Project: Psychology. The prediction markets predict the outcomes of the replications well and outperform a survey of market participants’ individual forecasts. This shows that prediction markets are a promising tool for assessing the reproducibility of published scientific results. The prediction markets also allow us to estimate probabilities for the hypotheses being true at different testing stages, which provides valuable information regarding the temporal dynamics of scientific discovery. We find that the hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%) and that a “statistically significant” finding needs to be confirmed in a well-powered replication to have a high probability of being true. We argue that prediction markets could be used to obtain speedy information about reproducibility at low cost and could potentially even be used to determine which studies to replicate to optimally allocate limited resources into replications.
### Sign up for PNAS alerts.
Get alerts for new articles, or get an alert when an article is cited.
[Manage alerts](https://www.pnas.org/action/showPreferences?menuTab=Alerts)
The process of scientific discovery centers on empirical testing of research hypotheses. A standard tool to interpret results in statistical hypothesis testing is the _P_ value. A result associated with a _P_ value below a predefined significance level (typically 0.05) is considered “statistically significant” and interpreted as evidence in favor of a hypothesis. However, concerns about the reproducibility of statistically significant results have recently been raised in many fields including medicine ( [1](https://www.pnas.org/doi/10.1073/pnas.1516179112#core-collateral-r1)– [3](https://www.pnas.org/doi/10.1073/pnas.1516179112#core-collateral-r3)), neuroscience ( [4](https://www.pnas.org/doi/10.1073/pn
... (truncated, 98 KB total)3a70c66d762d4007 | Stable ID: NmU3ZjA5OW