Forecasting AGI: Insights from Prediction Markets

blog

2025·LessWrong·lesswrong.com/posts/dRbvHfEwb6Cuf6xn3/forecasting-agi-ins...

Author

Alvin Ånestrand

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: LessWrong

A useful reference for understanding how current forecasting platforms operationalize AGI and what crowd-sourced timeline estimates look like, though the analysis is descriptive rather than technical.

Forum Post Details

Karma

Comments

Forum

lesswrong

Forum Tags

AI TimelinesForecasts (Specific Predictions)MetaculusPrediction MarketsAI

Metadata

Importance: 42/100blog postanalysis

Summary

This analysis surveys prediction market and Metaculus forecasts for AGI arrival, examining the specific benchmark criteria used in forecasting questions (Turing test, robotics, MMLU, APPS). It finds current AI systems closest to language/reasoning benchmarks but lagging in robotics and coding, with Metaculus's median AGI forecast at mid-2030 (IQR: 2026-2039). The author cautions that benchmark-passing systems may still not be sufficiently agentic to replace human workers.

Key Points

•Metaculus AGI criteria require Turing test passage, robotic capabilities, and high performance on MMLU and APPS benchmarks simultaneously.
•Current systems like GPT-4o and o3 are nearest to meeting language/reasoning benchmarks but significantly lag in robotics and coding tasks.
•Metaculus median AGI forecast is mid-2030, with substantial uncertainty (interquartile range spanning 2026–2039).
•Benchmark-based AGI definitions are rough proxies; a system passing all criteria may still lack the agentic capability to replace human workers.
•Prediction markets offer a complementary crowd-sourced perspective on timelines, but their incentive structures and question definitions introduce biases.

Cited by 1 page

Page	Type	Quality
AI Timelines	Concept	95.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202610 KB

# Forecasting AGI: Insights from Prediction Markets and Metaculus
By Alvin Ånestrand
Published: 2025-02-04
I have tried to find all prediction market and Metaculus questions related to AGI timelines. Here I examine how they compare to each other, and what they actually say about when AGI might arrive.

If you know of a market that I have missed, please tell me in the comment section! It would also be helpful if you tell me about what questions you think are relevant but are missing from this analysis. This is a linkpost, and I prefer if you comment in the original post on my new blog, [Forecasting AI Futures](https://forecastingaifutures.substack.com/p/forecasting-agi-insights-from-prediction-markets), but feel free to comment here as well. Subscribe to the blog for updates on my future forecasting posts related to AI safety.

Whenever possible, please check the more recent probability estimates in the embedded sites, instead of looking at my At The Time Of Writing (ATTOW) numbers.

So, what does prediction markets and Metaculus have to say about AGI?

Metaculus has this question for the arrival date of AGI:

The AI system needs to be able to:

*   Pass a really hard Turing test.
*   Have general robotic capabilities (being able to assemble a “[circa-2021 Ferrari 312 T4 1:8 scale automobile model](https://www.deagostini.com/uk/assembly-guides/)” or equivalent).
*   Achieve “at least 75% accuracy in every task and 90% mean accuracy across all tasks” on the [MMLU benchmark](https://arxiv.org/abs/2009.03300), which measures expertise in a wide range of academic subjects.
*   Achieve at least 90% accuracy with a single attempt for each question on the [APPS benchmark](https://arxiv.org/abs/2105.09938#:~:text=To%20meet%20this%20challenge%2C%20we%20introduce%20APPS%2C%20a,natural%20language%20specification%20and%20generate%20satisfactory%20Python%20code.), which measures coding skills.

Metaculus thinks this will probably occur around the middle of 2030, though with high uncertainty. The interval between the lower and upper quartiles for the individual predictions on this question is (2026-12-28 - 2039-03-27) ATTOW.

GPT-4o achieves an accuracy of 88.7% on MMLU, as seen in the leaderboard [here](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu). GPT-4 was used to get [22% accuracy on APPS](https://paperswithcode.com/paper/mapcoder-multi-agent-code-generation-for). Unfortunately, most of the best models have not been tested on either MMLU or APPS.

OpenAI’s o3 [has been reported](https://en.wikipedia.org/wiki/OpenAI_o3#Capabilities) of achieving 71.7% on [SWE-bench Verified](https://www.swebench.com/#verified). We can compare that to GPT-4, which managed to achieve 22.4% on SWE-bench Verified and 22% accuracy on APPS. Based on this, I think o3 would manage to achieve above 50% accuracy on APPS.

The two criteria that AI currently seem furthest from fulfilling are the robotics capabilities and APPS accuracy, though current best p

... (truncated, 10 KB total)

Resource ID: 90fca29ade44fd7d | Stable ID: sid_8kLtvykOmV