Skip to content
Longterm Wiki
Navigation
Updated 2026-01-28HistoryData
Page StatusResponse
Edited 2 months ago1.4k words26 backlinksUpdated every 6 weeksOverdue by 22 days
56QualityAdequate21ImportancePeripheral31ResearchLow
Content7/13
SummaryScheduleEntityEdit historyOverview
Tables1/ ~6Diagrams1/ ~1Int. links31/ ~11Ext. links0/ ~7Footnotes0/ ~4References21/ ~4Quotes0Accuracy0RatingsN:3.5 R:6 A:5.5 C:6.5Backlinks26
Issues1
StaleLast edited 67 days ago - may need review

Prediction Markets (AI Forecasting)

Approach

Prediction Markets (AI Forecasting)

Prediction markets achieve Brier scores of 0.16-0.24 (15-25% better than polls) by aggregating dispersed information through financial incentives, with platforms handling $1-3B annually. For AI safety, they provide useful near-term forecasting (70% accuracy on 1-year policy questions) but struggle with long-horizon questions due to thin liquidity, high discount rates, and definitional ambiguity.

MaturityGrowing adoption; proven concept
Key StrengthIncentive-aligned information aggregation
Key LimitationLiquidity, legal barriers, manipulation risk
Key PlayersPolymarket, Metaculus, Manifold, Kalshi
Related
Risks
AI Flash DynamicsAI Development Racing DynamicsAI-Powered Consensus Manufacturing
1.4k words · 26 backlinks

Overview

Prediction markets are trading platforms where participants buy and sell contracts whose payouts depend on future events. When a contract for "Will X happen?" trades at $0.70, the market is collectively estimating a 70% probability. This mechanism harnesses the "wisdom of crowds" by giving traders a financial incentive to bet according to their true beliefs rather than social pressure or wishful thinking.

The empirical track record is strong. In U.S. presidential elections, prediction markets have outperformed polls by 15-25% on accuracy metrics, achieving Brier scores of 0.16-0.24 compared to 0.20-0.30 for polling averages (Berg et al., 2008). In scientific replication markets, traders correctly predicted which studies would replicate 85% of the time, compared to 58% for expert surveys (Dreber et al., 2015). The theoretical basis for this performance rests on information aggregation—when dispersed private information gets expressed through trading, prices converge toward accuracy (Arrow et al., 2008).

For epistemic infrastructure, prediction markets offer three key advantages over alternatives like expert panels or opinion surveys. First, they create continuous, real-time probability estimates that update within minutes of relevant news. Second, they weight opinions by confidence—traders who believe strongly stake more capital. Third, they're resistant to ideological capture because consistently wrong traders lose money and exit the market. The foundational analysis by Wolfers & Zitzewitz (2004) demonstrates these mechanisms work across political, sports, and economic contexts.

Quick Assessment

DimensionAssessmentEvidence
TractabilityHighPlatforms exist and work; main barriers are regulatory
ScalabilityMediumRequires sufficient liquidity per question; thin markets unreliable
Current MaturityMedium-HighDecades of empirical evidence; mainstream adoption growing
Time HorizonActive nowAlready deployed; question is expansion
Key ProponentsPolymarket, Metaculus, KalshiActive platforms with different regulatory approaches

How It Works

The core mechanism is straightforward: markets convert private beliefs into public prices through trading.

Consider a simple binary contract: "Will the EU pass comprehensive AI regulation by 2026?" Trading opens at $0.50 (50% implied probability). Traders who believe passage is more likely buy contracts; those who think it unlikely sell. Each trade pushes the price toward the buyer's or seller's belief, weighted by how much they're willing to stake. If a trader with good information about EU politics spots the price at $0.50 but believes the true probability is 75%, they profit by buying—and in doing so, move the price closer to accuracy.

Three mechanisms make this work:

Incentive alignment. Unlike polls or surveys, traders face real consequences for being wrong. Hanson (2003) formalized how this creates "truth-seeking" behavior—traders who consistently predict well accumulate capital, while poor forecasters go broke and exit.

Information aggregation. Markets don't require any single trader to know everything. A journalist might have information about political feasibility, a lobbyist about industry positions, an academic about technical constraints. When each trades based on their slice of knowledge, prices aggregate their dispersed information.

Continuous updating. Unlike quarterly polls or annual expert surveys, market prices adjust instantly to new information. During the 2016 Brexit referendum, Betfair prices tracked exit poll releases in real-time, providing probability updates every few minutes.

Modern platforms use Automated Market Makers (AMMs) based on logarithmic market scoring rules. These algorithms provide liquidity even when few traders are active, but impose exponentially increasing costs on large trades—making sustained manipulation expensive.

Diagram (loading…)
flowchart TD
  subgraph Traders["Diverse Information Sources"]
      T1["Journalist
(political feasibility)"]
      T2["Lobbyist
(industry positions)"]
      T3["Academic
(technical constraints)"]
      T4["Insider
(timing signals)"]
  end

  subgraph Market["Market Mechanism"]
      BUY["Buy contracts
(price rises)"]
      SELL["Sell contracts
(price falls)"]
      AMM["Automated Market Maker
(provides liquidity)"]
  end

  subgraph Output["Information Output"]
      PRICE["Market Price = Probability"]
      UPDATE["Real-time updates
(minutes, not months)"]
  end

  T1 & T2 & T3 & T4 --> BUY
  T1 & T2 & T3 & T4 --> SELL
  BUY & SELL --> AMM
  AMM --> PRICE
  PRICE --> UPDATE

  style PRICE fill:#d4edda
  style UPDATE fill:#d4edda

Current Landscape

The prediction market ecosystem splits along a regulatory fault line.

Crypto-native platforms like Polymarket operate offshore using cryptocurrency, capturing $1-3 billion in annual trading volume as of 2024—a 10x increase from 2023. These platforms offer the widest question variety and deepest liquidity but exist in regulatory grey zones, particularly for U.S. participants. Polymarket achieves Brier scores of 0.16-0.22 on political questions.

Regulated real-money markets face tighter constraints. In the U.S., the CFTC classifies prediction contracts as derivatives, requiring platforms like Kalshi to seek approval for each question category. Kalshi has steadily expanded permitted categories but operates with lower volume ($100-300M annually) and narrower question sets. The UK and EU offer more permissive frameworks, with Betfair handling $50B+ in annual volume across sports and politics.

Play-money platforms sidestep regulations by removing financial stakes. Metaculus leads in AI and science forecasting with 15,000+ active forecasters and verified track records dating to 2015. Superforecasters on the platform achieve Brier scores of 0.15-0.19 on AI timeline questions (Good Judgment research). Manifold Markets allows users to create questions on any topic, trading coverage breadth for accuracy.

Applications to AI Safety

Prediction markets offer potentially valuable inputs for AI governance, though with significant limitations for the questions that matter most.

For near-term forecasting, the track record is promising. Markets on AI policy questions (regulation passage, lab announcements, capability milestone dates) show roughly 70% accuracy on 1-year horizons. Metaculus hosts active questions on AGI timeline estimates, capability benchmarks, and safety research progress. These provide continuously updated probability distributions that policymakers and researchers can incorporate into planning.

The harder problem is long-horizon forecasting. Questions like "probability of AI-caused catastrophe by 2100" suffer from multiple issues. First, resolution is decades away, and traders heavily discount long-term payoffs—empirical estimates suggest 15-40% annual discount rates for prediction market positions. Second, the forecaster pool for technical AI safety questions is small, leading to thin liquidity and wide bid-ask spreads. Third, definitional ambiguity compounds over long horizons: what exactly counts as "transformative AI" or "existential catastrophe"?

Conditional markets offer a partial solution. Rather than betting on absolute outcomes, traders bet on "If policy X passes, probability of outcome Y." This enables comparison of different intervention strategies while allowing resolution on shorter timescales. The infrastructure for sophisticated conditional markets is still developing.

Limitations

Several factors constrain prediction market accuracy and applicability.

Liquidity requirements. Small markets are unreliable. Research suggests $10-50K in coordinated trading can temporarily move prices 5%+ in markets with under $100K in total volume. Most AI safety-relevant questions have liquidity well below this threshold, making prices noisy indicators rather than reliable forecasts.

Behavioral biases persist. Despite financial incentives, traders exhibit the favorite-longshot bias (overweighting low-probability events) and herding (following visible trades rather than independent analysis). Extreme probability estimates (above 90% or below 10%) are particularly unreliable.

Resolution challenges. Many interesting questions resist clean operationalization. "Will AI alignment research make meaningful progress by 2027?" requires subjective judgment that reasonable people dispute. Platforms handle this through resolution councils (Metaculus) or predefined criteria, but ambiguity creates risk that discourages trading.

Regulatory fragmentation. U.S. restrictions push volume to offshore platforms with weaker oversight, while limiting mainstream institutional participation. Academic researchers, foundations, and government bodies often can't legally trade on the platforms with best liquidity.

Manipulation vulnerability. While sustained manipulation is expensive due to AMM mechanics, temporary price distortion around key decision points is feasible for well-funded actors—precisely when accurate forecasts matter most for policy.

Future Directions

The trajectory of prediction markets depends heavily on regulatory decisions over the next 3-5 years.

If U.S. CFTC restrictions loosen—currently estimated at 30-50% probability—regulated market volume could increase 10x as institutional participants enter legally. Several state-level initiatives may provide workarounds before federal action. The EU appears likely to harmonize regulations across member states, potentially creating a unified European market.

Technological developments may address some current limitations. AI trading algorithms are already participating on some platforms and may tighten spreads through arbitrage. Better AMM designs could reduce liquidity costs for long-horizon questions. Cross-platform arbitrage infrastructure would unify prices across fragmented markets.

For AI safety applications specifically, the key question is whether specialized forecasting platforms can attract sufficient domain expertise. Current play-money platforms like Metaculus demonstrate that scientists and researchers will participate without financial incentives, but scaling this to the precision needed for policy guidance remains uncertain.

Key Uncertainties

Several open questions shape how useful prediction markets can become for AI governance:

  • Regulatory liberalization: Will U.S. barriers drop before crypto platforms capture most institutional attention?
  • Long-horizon viability: Can conditional markets and milestone structures make 5-10 year forecasting reliable?
  • AI integration: Will AI trading algorithms improve accuracy through faster information processing, or degrade it by exploiting human traders?
  • Manipulation costs: At what market size do manipulation attempts become prohibitively expensive for state-level actors?
  • Expert participation: Can platforms attract enough domain experts in AI safety to produce informed prices on technical questions?

Key Questions

  • ?Can long-horizon markets maintain sufficient liquidity for AI safety-relevant questions with 5-10 year timelines?
  • ?How will AI trading algorithms affect human forecaster incentives and overall market accuracy?
  • ?What market size is needed to resist manipulation attempts by well-funded actors during critical policy windows?
  • ?Will regulatory liberalization occur fast enough to enable institutional participation in AI forecasting?

Further Reading

The foundational theoretical work includes Arrow et al. (2008) on information aggregation and Hanson (2003) on market design. For empirical evidence, Berg et al. (2008) provides the canonical analysis of election forecasting accuracy, while Dreber et al. (2015) extends this to scientific replication. Wolfers & Zitzewitz (2004) offer a comprehensive overview of prediction market theory and applications.

Major platforms include Polymarket (crypto-native, highest volume), Kalshi (U.S. regulated), and Metaculus (play-money with strong AI safety coverage). For developing forecasting skills, Good Judgment offers training programs based on superforecaster research.


References

Chen & Plott (2002) examine information aggregation mechanisms, specifically prediction markets, as tools for combining dispersed private information to produce accurate forecasts. The paper provides theoretical grounding and experimental evidence for how well-designed market mechanisms can elicit and aggregate distributed knowledge. It is a foundational work in the literature on prediction markets and mechanism design for forecasting.

★★★★☆

Robin Hanson proposes a governance framework distinguishing value decisions (made democratically) from empirical belief questions (resolved via prediction markets). The paper argues that futarchy—using decision markets to select policies based on forecasted welfare outcomes—could improve collective decision-making by better aggregating dispersed information.

3Market Scoring Ruleseecs.harvard.edu

Robin Hanson's foundational paper introducing Market Scoring Rules (MSRs), a mechanism that combines proper scoring rules with prediction markets to create automated market makers that always accept trades. MSRs provide incentives for truthful probability reporting while maintaining liquidity without requiring matched counterparties, making them practical tools for information aggregation.

4Dreber et al. (2015)PNAS (peer-reviewed)

This PNAS paper demonstrates that prediction markets can accurately forecast the outcomes of scientific replications, outperforming individual expert surveys. Applied to 44 psychology studies from the Reproducibility Project, the markets estimated that hypotheses tested in psychology have a median prior probability of only 9% of being true, and that statistically significant results require well-powered replications to achieve high confidence.

★★★★★

Augur is a decentralized prediction market protocol built on blockchain technology, now undergoing a reboot with a focus on oracle design and outsourced resolution mechanisms. The project recently published the 'Augur Lituus Whitepaper' proposing a modular oracle system with comparative security analysis. It represents one of crypto's earliest and most prominent experiments in decentralized forecasting infrastructure.

Good Judgment Inc. is the commercial spinoff of Philip Tetlock's landmark forecasting research, which demonstrated that a select group of 'superforecasters' can consistently outperform intelligence analysts and expert predictions using rigorous probabilistic thinking. The platform aggregates expert forecasts on geopolitical, technological, and scientific questions. It is highly relevant to AI safety for evaluating AI capabilities timelines and risk assessments.

★★★☆☆

This URL was intended to reference the EU's Markets in Financial Instruments Directive (MiFID), which regulates investment services and financial markets across the European Union. However, the page no longer exists at this location, returning a 404 error. The directive itself is a key piece of EU financial regulation governing trading venues, investment firms, and market transparency.

★★★★☆

The CFTC is the U.S. federal agency regulating commodity futures, derivatives, and prediction markets. It oversees market integrity, enforces against fraud and manipulation, and coordinates with the SEC on financial market regulation. Its jurisdiction over prediction markets is relevant to AI safety discussions around forecasting and information aggregation mechanisms.

Kalshi is a CFTC-regulated prediction market platform in the United States that allows users to trade on the outcomes of real-world events. It represents a legal, regulated approach to prediction markets, which are relevant to AI governance and forecasting discussions. Users can buy and sell contracts tied to economic, political, and other event outcomes.

The FCA is the UK's financial services regulator, responsible for overseeing financial markets, protecting consumers, and ensuring market integrity. It publishes guidance, rules, and policy statements relevant to AI and algorithmic systems used in financial services. Its regulatory frameworks increasingly address automated decision-making, algorithmic trading, and AI deployment in financial contexts.

Good Judgment Open is a crowd-sourced forecasting platform where participants predict geopolitical, economic, and technological events, with top performers earning the 'Superforecaster' designation. Founded by Philip Tetlock, whose research demonstrated that structured probabilistic thinking can dramatically improve prediction accuracy. The platform serves as both a competitive forecasting community and a research tool for studying human judgment under uncertainty.

Betfair Exchange is a peer-to-peer betting exchange that functions as a real-money prediction market, allowing users to trade on outcomes of events at market-determined odds. It aggregates dispersed information into prices through competitive bidding, making it a practical example of mechanism design for information elicitation and forecasting.

A curated collection of community forecasting questions on Metaculus specifically focused on AGI timelines, capabilities, and related developments. Users submit probabilistic predictions aggregated into community forecasts, providing crowd-sourced estimates on when and how AGI might emerge.

★★★☆☆

Wolfers and Zitzewitz provide a foundational survey of prediction markets, examining how these market mechanisms aggregate dispersed information into accurate probabilistic forecasts. They analyze evidence from political, financial, and sports markets, demonstrating that prediction markets often outperform expert opinion and traditional forecasting methods.

Metaculus is a collaborative online forecasting platform where users make probabilistic predictions on future events across domains including AI development, biosecurity, and global catastrophic risks. It aggregates crowd wisdom and expert forecasts to produce calibrated probability estimates on complex questions relevant to long-term planning and existential risk assessment.

★★★☆☆

Polymarket is a decentralized prediction market platform where users trade on the probabilistic outcomes of real-world events, aggregating crowd wisdom into market-derived probability estimates. It covers topics including politics, science, and current events, providing a real-time signal of collective forecasting belief. The platform is relevant to AI safety discussions around forecasting AI timelines, governance outcomes, and information aggregation mechanisms.

★★★☆☆

This NBER working paper by Justin Wolfers and Eric Zitzewitz examines prediction markets as mechanisms for aggregating dispersed information into accurate forecasts. It analyzes the theoretical foundations and empirical performance of prediction markets, exploring their design and potential applications for improving collective forecasting accuracy.

★★★★☆

This page explains Metaculus's resolution council mechanism, which governs how forecasting questions are resolved when outcomes are ambiguous or disputed. It describes the governance structure for maintaining integrity and fairness in a prediction platform used for aggregating probabilistic forecasts on important future events.

★★★☆☆

Gnosis Conditional Tokens is a framework for creating, trading, and resolving conditional outcome tokens on blockchain infrastructure, enabling the construction of prediction markets and other information aggregation mechanisms. It provides a standard for representing positions contingent on multiple outcomes, supporting complex market structures that can aggregate distributed knowledge about future events.

Related Wiki Pages

Top Related Pages

Analysis

XPT (Existential Risk Persuasion Tournament)Irreversibility Threshold ModelAI Capability Threshold Model

Approaches

AI AlignmentAI System Reliability TrackingAI for Human Reasoning FellowshipAI-Era Epistemic Infrastructure

Organizations

Good Judgment (Forecasting)Manifold (Prediction Market)PolymarketKalshi (Prediction Market)SamotsvetyQURI (Quantified Uncertainty Research Institute)

Concepts

Epistemic Orgs OverviewCollective Intelligence / Coordination

Other

Nuño SempereRobin Hanson

Key Debates

AI Risk Critical Uncertainties Model