Skip to content
Longterm Wiki
Back

Forecasting AGI: Insights from Prediction Markets

blog

Author

Alvin Ånestrand

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: LessWrong

A useful reference for understanding how current forecasting platforms operationalize AGI and what crowd-sourced timeline estimates look like, though the analysis is descriptive rather than technical.

Forum Post Details

Karma
13
Comments
0
Forum
lesswrong
Forum Tags
AI TimelinesForecasts (Specific Predictions)MetaculusPrediction MarketsAI

Metadata

Importance: 42/100blog postanalysis

Summary

This analysis surveys prediction market and Metaculus forecasts for AGI arrival, examining the specific benchmark criteria used in forecasting questions (Turing test, robotics, MMLU, APPS). It finds current AI systems closest to language/reasoning benchmarks but lagging in robotics and coding, with Metaculus's median AGI forecast at mid-2030 (IQR: 2026-2039). The author cautions that benchmark-passing systems may still not be sufficiently agentic to replace human workers.

Key Points

  • Metaculus AGI criteria require Turing test passage, robotic capabilities, and high performance on MMLU and APPS benchmarks simultaneously.
  • Current systems like GPT-4o and o3 are nearest to meeting language/reasoning benchmarks but significantly lag in robotics and coding tasks.
  • Metaculus median AGI forecast is mid-2030, with substantial uncertainty (interquartile range spanning 2026–2039).
  • Benchmark-based AGI definitions are rough proxies; a system passing all criteria may still lack the agentic capability to replace human workers.
  • Prediction markets offer a complementary crowd-sourced perspective on timelines, but their incentive structures and question definitions introduce biases.

Cited by 1 page

PageTypeQuality
AI TimelinesConcept95.0

Cached Content Preview

HTTP 200Fetched Feb 26, 2026302 KB
x This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. Forecasting AGI: Insights from Prediction Markets and Metaculus — LessWrong AI Timelines Forecasts (Specific Predictions) Metaculus Prediction Markets AI Frontpage 13 Forecasting AGI: Insights from Prediction Markets and Metaculus by Alvin Ånestrand 4th Feb 2025 AI Alignment Forum Linkpost from forecastingaifutures.substack.com 5 min read 0 13 Ω 4 I have tried to find all prediction market and Metaculus questions related to AGI timelines. Here I examine how they compare to each other, and what they actually say about when AGI might arrive. If you know of a market that I have missed, please tell me in the comment section! It would also be helpful if you tell me about what questions you think are relevant but are missing from this analysis. This is a linkpost, and I prefer if you comment in the original post on my new blog, Forecasting AI Futures , but feel free to comment here as well. Subscribe to the blog for updates on my future forecasting posts related to AI safety. Whenever possible, please check the more recent probability estimates in the embedded sites, instead of looking at my At The Time Of Writing (ATTOW) numbers. So, what does prediction markets and Metaculus have to say about AGI? Metaculus has this question for the arrival date of AGI: The AI system needs to be able to: Pass a really hard Turing test. Have general robotic capabilities (being able to assemble a “ circa-2021 Ferrari 312 T4 1:8 scale automobile model ” or equivalent). Achieve “at least 75% accuracy in every task and 90% mean accuracy across all tasks” on the MMLU benchmark , which measures expertise in a wide range of academic subjects. Achieve at least 90% accuracy with a single attempt for each question on the APPS benchmark , which measures coding skills. Metaculus thinks this will probably occur around the middle of 2030, though with high uncertainty. The interval between the lower and upper quartiles for the individual predictions on this question is (2026-12-28 - 2039-03-27) ATTOW. GPT-4o achieves an accuracy of 88.7% on MMLU, as seen in the leaderboard here . GPT-4 was used to get 22% accuracy on APPS . Unfortunately, most of the best models have not been tested on either MMLU or APPS. OpenAI’s o3 has been reported of achieving 71.7% on SWE-bench Verified . We can compare that to GPT-4, which managed to achieve 22.4% on SWE-bench Verified and 22% accuracy on APPS. Based on this, I think o3 would manage to achieve above 50% accuracy on APPS. The two criteria that AI currently seem furthest from fulfilling are the robotics capabilities and APPS accuracy, though current best performance on the APPS benchmark is uncertain. Coding capabilities are improving very fast, which indicated by the rapid improvements in accuracy in SWE-bench Verified , while robotics capabilities are lagging behind. If there are not too many errors in the APPS benc

... (truncated, 302 KB total)
Resource ID: 90fca29ade44fd7d | Stable ID: ZmFiYjczMz