Skip to content
Longterm Wiki
Navigation
Updated 2026-01-29HistoryData
Page StatusContent
Edited 2 months ago4.6k words35 backlinksUpdated every 6 weeksOverdue by 21 days
50QualityAdequate •31ImportanceReference48.5ResearchLow
Content7/13
SummaryScheduleEntityEdit historyOverview
Tables32/ ~18Diagrams2/ ~2Int. links9/ ~37Ext. links94/ ~23Footnotes0/ ~14References11/ ~14Quotes0Accuracy0RatingsN:2.5 R:5 A:3.5 C:6.5Backlinks35
Issues3
QualityRated 50 but structure suggests 100 (underrated by 50 points)
Links24 links could use <R> components
StaleLast edited 66 days ago - may need review

Metaculus

Startup

Metaculus

Metaculus is a reputation-based forecasting platform with 1M+ predictions showing AGI probability at 25% by 2027 and 50% by 2031 (down from 50 years away in 2020). Analysis finds good short-term calibration (Brier 0.107) but poor calibration on 1+ year horizons; human Pro Forecasters consistently outperform AI bots (p=0.00001 in Q2 2025).

TypeStartup
Related
People
Anthony Aguirre
Organizations
Future of Life Foundation (FLF)
4.6k words · 35 backlinks

Quick Assessment

DimensionAssessmentEvidence
ScaleLargest forecasting platform1M+ predictions, 15,000+ questions, 50,000+ users
AI FocusPrimary AGI timeline sourceDedicated AI categories, multiple timeline questions, AI 2027 tournament
AccuracyGenerally good short-termBrier score 0.107 (2021 questions), Metaculus Prediction outperforms median
FundingUSD 8.8M+ totalUSD 5.5M Coefficient (2022), USD 3M Coefficient (2023), USD 300K EA Infra Fund (2021)
InfluenceHighCited by 80,000 Hours, Coefficient Giving, media, policymakers
Track RecordMixedGood calibration short-term, weaker on 1+ year questions per EA Forum analysis
PartnershipsStrongGood Judgment Inc collaboration, Bridgewater USD 30K competitions, Vox Future Perfect
AI vs Human BenchmarkHumans leadPro Forecasters outperform AI bots with p = 0.00001 significance in Q2 2025

Organization Details

AttributeDetails
Full NameMetaculus Inc.
Founded2015
FoundersAnthony Aguirre (physicist, FLI co-founder), Greg Laughlin (Yale astrophysicist), Max Wainwright (data scientist)
CEODeger Turan (appointed April 2024)
Team Size≈28 employees across North America, Europe, and Asia
LocationSanta Monica, California (headquarters)
StatusPublic Benefit Corporation (restructured 2022)
Websitemetaculus.com
Key FeaturesPrediction aggregation, tournaments, track record scoring, AI benchmarking
Total FundingUSD 8.8M+ from Coefficient Giving, EA Infrastructure Fund

Overview

Metaculus is a reputation-based online forecasting platform that has become the most influential source for AI timeline predictions. Founded in 2015 by physicists and data scientists from UC Santa Cruz and Yale, Metaculus aggregates predictions from thousands of forecasters on questions spanning technology, science, politics, and existential risks—with particular depth on AI capabilities and timelines.

The platform's core innovation is its aggregation algorithm, which combines individual forecasts into a single community estimate that consistently outperforms the simple median of user predictions when evaluated using Brier or Log scoring rules. The Community Prediction uses a recency-weighted median, while the now-deprecated "Metaculus Prediction" applied performance weighting and extremization to further improve accuracy. This aggregation enables Metaculus to extract signal from noise, producing calibrated probability estimates that inform research, policy, and public understanding.

Metaculus has documented a dramatic shift in AI timeline forecasts over recent years. The median estimate for when AGI would be developed fell from 50 years away in 2020 to approximately 5 years by late 2024. This shift—reflected in questions tracking "weak AGI," "general AI," and "transformative AI"—represents one of the most significant updates in the forecasting community's collective judgment about AI development trajectories.

The platform occupies a unique position in the AI safety ecosystem, providing quantitative probability estimates that inform decisions at organizations like Coefficient Giving, 80,000 Hours, and major AI labs. Its tournaments have also become important benchmarks for comparing human forecasters against AI forecasting systems, with human Pro Forecasters maintaining a statistically significant lead over AI bots through 2025.

History and Founding

Origins (2014-2015)

Metaculus emerged from the intersection of cosmological research and technology forecasting. Co-founder Anthony Aguirre, a theoretical cosmologist and Faggin Presidential Chair for the Physics of Information at UC Santa Cruz, had previously co-founded the Foundational Questions Institute (FQXi) in 2006 with Max Tegmark to support unconventional physics research. In 2014, Aguirre and Tegmark co-founded the Future of Life Institute (FLI), an organization focused on the implications of transformational technology, particularly artificial intelligence.

Metaculus was conceived alongside FLI as a complementary tool. According to Aguirre, the forecasting platform was designed to "build an ability to make predictions and identify people who are really good at making predictions and modeling the world." The platform started partly to "be a thing that would be of service to the Future of Life Institute, but also everybody else who's thinking about the future" (AXRP Episode 38.7).

Aguirre partnered with Greg Laughlin, a Yale astrophysicist with expertise in computational methods and orbital dynamics, and Max Wainwright, a data scientist who had been a postdoctoral researcher for both Laughlin and Aguirre. The trio launched Metaculus in November 2015, initially focusing on science and technology predictions where their academic expertise provided domain knowledge.

Platform Development (2016-2019)

The early platform focused on building a community of technically-minded forecasters and developing the mathematical infrastructure for prediction aggregation. In June 2017, Metaculus introduced the Metaculus Prediction—a sophisticated aggregation system that weighted forecasts based on past performance and applied extremization to compensate for systematic human cognitive biases. This innovation helped distinguish Metaculus from simple prediction aggregators.

The platform gradually expanded from its initial science and technology focus to include questions on geopolitics, economics, and global risks. This expansion positioned Metaculus to become a central resource for the effective altruism community, which was increasingly interested in quantitative forecasts for cause prioritization and career decisions.

Growth and Institutionalization (2020-2022)

The COVID-19 pandemic marked a turning point for Metaculus's public profile. In January 2020, Metaculus introduced the Bentham Prize, awarding bi-weekly monetary prizes of USD 300, USD 200, and USD 100 to the most valuable user contributions. In February 2020, they launched the Li Wenliang Prize, named after the Chinese doctor who warned about COVID-19, offering monetary prizes for questions, forecasts, and analyses related to the outbreak.

The pandemic demonstrated the platform's ability to rapidly aggregate expert judgment on developing situations, attracting significant attention from researchers and policymakers. By 2022, Metaculus reached 1,000,000 individual predictions and restructured as a public-benefit corporation, signaling a commitment to forecasting as a public good rather than a purely commercial venture.

Scaling with Major Funding (2022-Present)

Coefficient Giving's USD 5.5 million grant in 2022 transformed Metaculus's capacity, enabling significant hiring and platform development. The funding supported high-impact forecasting programs on AI, biosecurity, climate change, nuclear security, and other topics of concern to the longtermist community. A follow-up USD 3 million grant in 2023 further expanded capabilities.

In April 2024, Deger Turan became CEO, bringing experience from his role heading the AI Objectives Institute. The previous CEO transitioned to Special Advisor while remaining on the board. Under Turan's leadership, Metaculus launched major AI forecasting initiatives including the AI Forecasting Benchmark Tournament, which benchmarks AI forecasting systems against human Pro Forecasters.

In 2024, Metaculus rewrote their website code and released it under the BSD-2-Clause License, making their platform open source. The AI Forecasting Benchmark Series continued into 2025, with Q1 results prompting a renewal announcement in July 2025 for an expanded year-long iteration backed by USD 175,000 in prizes.

Team and Leadership

RolePersonBackground
CEODeger TuranFormer head of AI Objectives Institute; appointed April 2024
Co-Founder & PresidentAnthony AguirreUC Santa Cruz physics professor; FLI Executive Director; FQXi founder
Co-FounderGreg LaughlinYale astrophysicist; computational methods expert
Co-FounderMax WainwrightData scientist; former postdoc with Laughlin/Aguirre
Chief of StaffNate MorrisonFormer ED of Teach For America - New Mexico
CTODan SchwarzTechnology leadership

The organization has grown to approximately 28 employees across three continents (North America, Europe, and Asia), reflecting its global forecaster community (Tracxn).

Key AGI Timeline Forecasts

Diagram (loading…)
flowchart LR
  2020[2020<br/>50 years to AGI] --> 2022[2022<br/>30 years to AGI]
  2022 --> 2024[Late 2024<br/>~7 years to AGI]
  2024 --> 2025[2025<br/>~5 years to AGI]

  style 2020 fill:#ffcccc
  style 2022 fill:#ffddcc
  style 2024 fill:#ffffcc
  style 2025 fill:#ccffcc

Current AGI Probability Estimates (as of late 2024)

TimelineMetaculus ProbabilityNotes
By 2027≈25%Dramatic increase from prior years
By 2030≈40-45%Central estimate range
By 2031≈50% (median)Current community median
By 2040≈75%Upper quartile

Metaculus AGI Definition

Metaculus uses a multi-criteria definition requiring systems to:

CriterionRequirement
Turing TestPass "really hard" conversational tests
Robotic CapabilitiesAssemble complex physical objects (e.g., Ferrari 312 T4 1:8 scale model)
Academic Performance75%+ accuracy on every MMLU task, 90% mean across all tasks
General CompetenceDemonstrate broad capability across diverse domains

This definition is more stringent than industry definitions (e.g., OpenAI's "economically valuable work"), leading to somewhat later timeline estimates compared to lab predictions.

Platform Statistics

MetricValueAs Of
Total Predictions1,000,000+2022 milestone
Total Questions15,000+2024
Registered Users50,000+2024
AI/Tech Questions2,000+Active and resolved
Average Predictors per Question50-200Varies by question prominence

Accuracy and Calibration

Metaculus Prediction vs. Community Median

The Metaculus Prediction aggregation algorithm provides measurable improvements:

Scoring MethodMetaculus Prediction vs. MedianFinding
Brier ScoreSuperiorConsistent outperformance
Log ScoreSuperiorBetter at extreme probabilities
CalibrationBetterMore reliable probability estimates

Calibration Analysis

Forecast HorizonCalibration QualityNotes
Less than 3 monthsGoodWell-calibrated on near-term questions
3-12 monthsModerateSome overconfidence
Greater than 1 yearPoorAnalysis found systematic miscalibration

One EA Forum analysis found Metaculus was "poorly calibrated on resolved questions with a greater than 1 year time horizon," suggesting caution when interpreting long-range AI forecasts.

Predictor Quantity Effects

Research on Metaculus data shows diminishing returns to additional forecasters:

Number of PredictorsMarginal ImprovementNotes
1-10LargeEach additional forecaster helps significantly
10-50ModerateContinued but slower improvement
50+SmallMarginal gains diminish substantially

Aggregation Methodology

Metaculus employs sophisticated aggregation techniques to combine individual forecasts into community estimates. Understanding these methods is essential for interpreting the platform's predictions.

Community Prediction: Recency-Weighted Median

The Community Prediction uses a recency-weighted median approach:

ElementDescription
Base MeasureMedian of individual forecaster probabilities
WeightingMore recent predictions receive higher weights
Weight FormulaOldest prediction receives weight 1; newest among n predictions receives weight n
Update RequirementRoughly half of forecasters must update to substantially shift the aggregate
RationaleBalances responsiveness to new information against resistance to transient outliers

For different question types (Metaculus FAQ):

  • Binary Questions: Weighted median of individual probabilities
  • Multiple Choice: Weighted median, renormalized to sum to 1
  • Numeric/Date Questions: Weighted average of individual distributions

Metaculus Prediction: Performance Weighting + Extremization

The "Metaculus Prediction" (deprecated since November 2024) employed a more sophisticated approach (Metaculus Notebooks):

ComponentFunction
Performance WeightingCalibrates and weights each user based on track record
ExtremizationPushes consensus forecasts toward 0 or 1 to compensate for cognitive biases
GoalProduce a prediction better than even the best individual forecaster

How Extremization Works (EA Forum):

Extremizing adjusts aggregated forecasts toward extreme probabilities. The rationale: if several independent forecasters conclude something is 90% likely, their agreement provides additional evidence beyond any individual's estimate. Research on geopolitical forecasting found optimal extremizing factors between 1.161 and 3.921.

Scoring System

Metaculus uses a logarithmic scoring rule as the foundation for all scores (Metaculus Scoring Primer):

Score TypeDescriptionIntroduced
Log ScoreNatural logarithm of predicted probability for actual outcomeOriginal
Baseline ScoreCompares prediction to chance; rewards both accuracy and volumeNovember 2023
Peer ScoreCompares to other forecasters; equalizes for question difficultyNovember 2023

Key properties of the log score:

  • Proper scoring rule: The only way to optimize average score is to predict sincere beliefs
  • Punitive on extreme errors: Going from 99% to 99.9% yields only +0.009 if correct, but -2.3 if wrong
  • Time-averaged: Points are averaged across question lifetime to encourage ongoing updates

The November 2023 scoring update replaced the legacy Points system with Baseline and Peer scores, making performance comparison fairer for forecasters with different time constraints.

Diagram (loading…)
flowchart TD
  subgraph INPUT["Individual Forecasts"]
      F1[Forecaster 1<br/>75%]
      F2[Forecaster 2<br/>80%]
      F3[Forecaster 3<br/>70%]
      FN[Forecaster N<br/>...]
  end

  subgraph WEIGHT["Recency Weighting"]
      RW[Older forecasts<br/>weight = 1]
      RW2[Newer forecasts<br/>weight = n]
  end

  subgraph AGG["Aggregation"]
      MED[Weighted Median]
      CP[Community Prediction]
  end

  subgraph LEGACY["Historical - Deprecated Nov 2024"]
      PW[Performance<br/>Weighting]
      EXT[Extremization]
      MP[Metaculus<br/>Prediction]
  end

  INPUT --> WEIGHT
  WEIGHT --> MED
  MED --> CP
  MED --> PW
  PW --> EXT
  EXT --> MP

  style CP fill:#90ee90
  style MP fill:#ffcccc
  style LEGACY fill:#f5f5f5

Key AI Questions

Primary Timeline Questions

QuestionCurrent ForecastForecastersLink
When will first general AI be announced?October 2027 (as of Dec 2025)1,700+metaculus.com/questions/5121
Transformative AI dateNovember 2042166metaculus.com/questions/19356
Weak AGI arrival2028-2030 rangemetaculus.com/questions/3479
Time from AGI to superintelligence≈22 months (range: 5-167 months)240metaculus.com/questions/9062
AGI transition beneficial for humanitymetaculus.com/questions/4118

Company Attribution Forecasts

Metaculus forecasters have also estimated which organization is most likely to achieve AGI first (Forecastingaifutures.substack.com):

CompanyProbability of First AGI
Alphabet/Google DeepMind36.3%
OpenAI21.9%
Anthropic17.5%
Other24.3%

AI-Specific Tournaments

TournamentQuestionsStart DateFocus
AI 202719June 2025Near-term AI developments
AGI OutcomesLong-term consequences of AGI for humanity
AI Progress Tournament2023-2024Benchmark progress tracking

Tournaments and Competitions

AI Forecasting Benchmark Tournament

The AI Forecasting Benchmark Tournament represents Metaculus's flagship initiative for comparing human and AI forecasting capabilities. Launched in 2024, the tournament runs in two series: a primary 4-month seasonal tournament and a bi-weekly fast-paced MiniBench. Participants can compete using API credits provided by OpenAI and Anthropic.

QuarterQuestionsPrize PoolBot-MakersKey Finding
Q3 2024Best bots scored -11.3 vs Pro Forecasters (0 = equal)
Q4 2024Best bots improved to -8.6 vs Pro Forecasters
Q1 2025metac-o1 achieved first place among bots
Q2 2025348USD 30,00054Pros maintain clear lead (p = 0.00001)

Key findings from the Q2 2025 tournament (EA Forum):

  • Students and hobbyists performed well—the top 3 bot-makers (excluding Metaculus's in-house bots) were hobbyists or students
  • Aggregation had a significant positive effect: taking the median or mean of multiple forecasts rather than single LLM forecasts improved scores
  • Among baseline bots in Q2 2025, OpenAI's o3 led performance rankings
  • The average Peer score for the Metaculus Community Prediction is 12.9, ranking in the top 10 on the global leaderboard over every 2-year period since 2016

Bridgewater x Metaculus Forecasting Competition

Metaculus has partnered with Bridgewater Associates for three consecutive years on forecasting competitions designed to identify talented forecasters for potential recruitment. The competition emphasizes Bridgewater's "idea meritocracy" culture.

CompetitionQuestionsPrize PoolStructure
2026 Competition50USD 30,000 (Open) + USD 30,000 (Undergrad)Two leaderboards: Open and Undergraduate
Previous YearsMultiple offers made to top forecasters

The January 2026 competition features:

  • 50 forecasting questions on real-world events
  • Separate leaderboards for open competition and undergraduates
  • Top 50 forecasters in each track eligible for prizes
  • Potential employment opportunities at Bridgewater for top performers

Good Judgment Inc. Collaboration

Metaculus and Good Judgment Inc. announced their first formal collaboration on the Our World in Data (OWID) project, comparing methodologies across the two largest human judgment forecasting communities globally.

ElementDetails
Questions10 identical questions about Our World In Data metrics
TopicsTechnological advances, global development, social progress
Time Horizons1 to 100 years
ParticipantsSuperforecasters (Good Judgment) vs. Pro Forecasters (Metaculus)
FundingFuture Fund grant

According to Warren Hatch, Good Judgment's CEO: "We're excited to be partnering with Metaculus to combine our approaches to apply probabilistic thinking to an uncertain future."

Vox Future Perfect Collaboration

In January 2025, Metaculus partnered with Vox's Future Perfect team to host forecasts on political, economic, and technological questions for 2025, featuring:

  • Public participation alongside the Future Perfect team's published predictions
  • USD 2,500 prize pool to reward accurate contributions

Tournament Summary

TournamentFocusPartnersPrize Pool
AI Forecasting BenchmarkHuman vs AI forecastingOpenAI, AnthropicUSD 30,000/quarter
Bridgewater CompetitionTalent identificationBridgewater AssociatesUSD 60,000 total
OWID ProjectGlobal development metricsGood Judgment Inc.Future Fund
Vox Future Perfect 2025Annual predictionsVox MediaUSD 2,500
AI 2027Near-term AIInternal
AGI OutcomesLong-term AGI effectsInternal

Comparison with Other Platforms

Platform Characteristics

PlatformModelCurrencyAI FocusCommunity Size
MetaculusReputation-basedPoints/Peer scoresVery High50,000+ users
ManifoldPrediction marketPlay money (Mana)HighLarge
PolymarketPrediction marketReal money (crypto)ModerateHigh liquidity
KalshiRegulated marketReal money (USD)LowGrowing
Good JudgmentSuperforecaster panelsReputationModerate≈150 Superforecasters

Accuracy Comparison Studies

Research comparing forecasting platform accuracy has produced nuanced findings (Manifund research, Metaculus notebooks):

FindingSourceNotes
Real money markets outperform play money on most topicsBrier.fyi analysisIntuitive: arbitrage opportunities between play/real money
Metaculus/Manifold outperform real money on science topicsBrier.fyiSpecialized audiences trade for intellectual engagement
2022 Midterms: Metaculus scored highestFirst Sigma analysisBeat FiveThirtyEight, Manifold, Polymarket, PredictIt
Metaculus made most accurate Republican Senate predictions2022 MidtermsLowest (best) predictions among platforms
ACX contest: Metaculus outperformed ManifoldScott AlexanderNon-money forecaster beat play-money market
Manifold users rank Metaculus as more accurate than PolymarketSelf-reported pollCommunity perception

AGI Timeline Comparison (2024-2025)

PlatformAGI by 2027AGI by 2030Definition Notes
Metaculus≈25%≈45%Stringent: requires robotics, broad capability
Manifold≈47%≈60%More permissive definition
Polymarket≈9% (OpenAI)Company-specific question
Kalshi40% (OpenAI)Company-specific question
AGI Dashboard2031 combined estimateAggregates multiple sources

The AGI Timelines Dashboard aggregates data from Metaculus, Manifold, Kalshi, and other sources, producing a combined forecast of AGI arriving in 2031 (80% CI: 2027-2045) as of January 2026.

Why Estimates Differ

FactorEffect on Estimates
AGI DefinitionMetaculus requires robotics; others use "economically valuable work"
Incentive StructureReal money may attract informed traders; reputation may attract domain experts
Community CompositionMetaculus skews toward AI-interested, technically-oriented forecasters
Question FramingSpecific operationalization significantly affects forecasts

Funding History

Major Grants

YearSourceAmountPurposeLink
2019Coefficient GivingInitial supportcoefficientgiving.org
2021EA Infrastructure FundUSD 300,000Platform development
2022Coefficient GivingUSD 5,500,000Scaling, hiring, high-impact programscoefficientgiving.org
2022FTX Future FundUSD 20,000Grant (3 weeks before FTX collapse)
2023Coefficient GivingUSD 3,000,000Platform developmentcoefficientgiving.org
2024VariousUSD 175,000AI Forecasting Benchmark prizesTournament funding

Total Funding: USD 8.8M+ confirmed

Coefficient Giving has been Metaculus's primary funder, providing support under its Longtermism program, which focuses on work that "raises the probability of a very long-lasting, positive future" (Metaculus announcement). The USD 5.5M 2022 grant was described as enabling Metaculus to "scale as an organization pursuing its mission to build epistemic infrastructure for navigating complex global challenges."

Funding Focus Areas

Coefficient Giving's grants to Metaculus support high-impact forecasting programs in:

AreaRelevance to AI Safety
Artificial IntelligenceCore AGI timeline forecasts, AI benchmark tournaments
BiosecurityPandemic preparedness, bioweapons risk
Climate ChangeLong-term trajectory forecasting
Nuclear SecurityExistential risk quantification
Global Catastrophic RisksCross-cutting threat assessment

Partnerships and Collaborations

Industry and Research Partners

PartnerCollaboration TypeFocus
Good Judgment Inc.Methodology comparisonOWID project; Superforecaster vs. Pro Forecaster benchmarking
Bridgewater AssociatesTalent identificationAnnual forecasting competition with USD 60K prizes
OpenAIAI benchmarkingAPI credits for AI Forecasting Benchmark Tournament
AnthropicAI benchmarkingAPI credits for AI Forecasting Benchmark Tournament
Vox Future PerfectPublic forecasting2025 predictions tournament
80,000 HoursCareer researchAI timeline forecasts cited in career guidance
Coefficient GivingResearch & fundingGrant impact forecasting; primary funder

Academic and Institutional Connections

The organization's founders maintain deep connections to academic research and global risk institutions:

InstitutionConnection
UC Santa CruzAnthony Aguirre's academic home; Faggin Chair
Yale UniversityGreg Laughlin's position
Future of Life InstituteAnthony Aguirre serves as Executive Director
Foundational Questions Institute (FQXi)Anthony Aguirre serves as President
Bulletin of the Atomic ScientistsAnthony Aguirre is a contributor

Strengths and Limitations

Strengths

StrengthEvidence
ScaleLargest dedicated forecasting platform: 1M+ predictions, 50K+ users
AI DepthMost comprehensive coverage of AGI timeline questions; dedicated tournaments
Aggregation QualityBrier score of 0.107 (2021); consistently outperforms simple median
Track Record TransparencyPublic calibration data, historical accuracy available for analysis
Community EngagementActive forecaster base with ongoing updates; tournaments drive participation
Open SourcePlatform code released under BSD-2-Clause License (2024)
Institutional IntegrationForecasts inform decisions at Coefficient Giving, 80,000 Hours, AI labs
Human vs AI BenchmarkingOnly major platform systematically comparing human and AI forecasting

Limitations

LimitationAnalysis
Long-term CalibrationEA Forum analysis found poor calibration on questions with greater than 1 year horizons
Selection BiasForecasters skew toward AI-interested, technically-oriented demographics
Definition DependenceAGI timeline estimates vary significantly with operationalization
No Monetary IncentivesReputation-only may reduce accuracy vs. real-money markets for some question types
Question Framing EffectsOutcomes depend heavily on specific wording and resolution criteria
Limited Long-Horizon DataFew resolved questions with greater than 5 year horizons for validation
AI Progress OverestimationAI Progress Tournament analysis found community overconfident on AI predictions

AI-Specific Track Record Analysis

The Metaculus AI Progress Tournament analysis found:

FindingImplication
Progress on benchmarks was underestimatedAI progresses faster on well-defined tasks than expected
Progress on other proxies (compute, bibliometrics, economic indicators) was overestimatedReal-world impact lags benchmark performance
Community expected more AI developments than occurred (binary questions)Appropriate underconfidence partially compensates
Overconfidence on numeric predictionsCalibration weaker on magnitude estimates

This pattern suggests: "AI progresses surprisingly rapidly on well-defined benchmarks but the attention it receives and its 'real world' impact fail to keep up."

Relevance to AI Safety

Metaculus plays several important roles in the AI safety ecosystem:

Decision Support

Use CaseOrganizations
Cause prioritizationCoefficient Giving uses forecasts to inform grantmaking
Career guidance80,000 Hours cites AGI timelines in career advice
Research prioritizationAI safety researchers track timeline estimates
Policy planningGovernment bodies and think tanks reference forecasts

Epistemic Infrastructure

Metaculus provides quantitative probability estimates where previously only qualitative assessments existed. The dramatic shift from 50-year to 5-year AGI timelines between 2020 and 2024 represents one of the most significant and well-documented updates in collective expert judgment about AI development, providing valuable signal for resource allocation and urgency calibration.

AI Capability Benchmarking

The AI Forecasting Benchmark Tournament provides empirical data on the state of AI forecasting capabilities:

QuestionCurrent ForecastSignificance
"When will an AI be amongst the best forecasters on Metaculus?"February 2028 (median)Tracks AI reasoning progress
"Will largest AI forecasting system achieve Brier score less than 0.1 by 2026?"17%Superforecaster median is ≈0.1

These benchmarks help calibrate expectations about AI capability trajectories in reasoning-intensive domains.

Sources

  1. Metaculus - Wikipedia - Organization history and overview
  2. Anthony Aguirre - Wikipedia - Founder background
  3. Anthony Aguirre - Future of Life Institute - FLI connection
  4. Metaculus: a prediction website with an eye on science and technology - Yale News - 2016 launch coverage
  5. Announcing Deger Turan as the new CEO of Metaculus - Leadership transition
  6. Metaculus Awarded USD 5.5M Grant to Advance Forecasting as a Public Good - 2022 funding announcement
  7. Metaculus - Platform Development | Coefficient Giving - 2022 grant details
  8. Metaculus - Platform Development (2023) | Coefficient Giving - 2023 grant details
  9. Good Judgment Inc and Metaculus Launch First Collaboration - OWID project announcement
  10. Bridgewater x Metaculus 2026 Competition - Bridgewater partnership
  11. Q2 AI Benchmark Results: Pros Maintain Clear Lead - EA Forum - AI benchmarking results
  12. How does forecast quantity impact forecast quality on Metaculus? - EA Forum - Calibration analysis
  13. Takeaways from the Metaculus AI Progress Tournament - EA Forum - AI tournament analysis
  14. Takeaways from the Metaculus AI Progress Tournament | Coefficient Giving - Coefficient Giving analysis
  15. A Primer on the Metaculus Scoring Rule - Scoring methodology
  16. Exploring Metaculus's AI Track Record - EA Forum - AI accuracy analysis
  17. Predictive Performance on Metaculus vs. Manifold Markets - Platform comparison
  18. Shrinking AGI timelines: a review of expert forecasts | 80,000 Hours - Timeline analysis
  19. Forecasting AGI: Insights from Prediction Markets and Metaculus - AGI forecast aggregation
  20. Data on forecasting accuracy across different time horizons - EA Forum - Long-horizon analysis
  21. AXRP Episode 38.7 - Anthony Aguirre on the Future of Life Institute - Founder interview
  22. Metaculus Year in Review: 2022 - 1M prediction milestone
  23. Metaculus Company Profile - Tracxn - Organization details
  24. Principled extremizing of aggregated forecasts - EA Forum - Extremization methodology
  25. What can we learn from scoring different election forecasts? - First Sigma - 2022 election comparison

References

Metaculus is a collaborative online forecasting platform where users make probabilistic predictions on future events across domains including AI development, biosecurity, and global catastrophic risks. It aggregates crowd wisdom and expert forecasts to produce calibrated probability estimates on complex questions relevant to long-term planning and existential risk assessment.

★★★☆☆
2Future of Life InstituteFuture of Life Institute

The Future of Life Institute (FLI) is a nonprofit organization focused on steering transformative technologies, particularly AI, away from catastrophic risks and toward beneficial outcomes. They operate across policy advocacy, research funding, education, and outreach to promote responsible AI development. FLI has been influential in key AI safety milestones including the open letter on AI risks and the Asilomar AI Principles.

★★★☆☆

This blog post analyzes prediction market data to extract crowd-sourced forecasts about AGI timelines and development trajectories. It examines what aggregated probabilistic forecasts reveal about when transformative AI systems might arrive and the uncertainty surrounding those estimates.

★★☆☆☆

Good Judgment Inc and Metaculus announce their first collaborative project, where Superforecasters and Pro Forecasters make identical predictions on 10 Our World In Data metrics spanning technological advances, global development, and social progress across time horizons from 1 to 100 years. The project, supported by a Future Fund grant, aims to compare forecasting methodologies and advance the science of human judgment forecasting.

★★★☆☆
5AGI Timelines Dashboardagi.goodheartlabs.com

An interactive dashboard aggregating and visualizing AGI timeline forecasts from major prediction markets and forecasting platforms including Metaculus, Manifold Markets, and Kalshi. It displays median year predictions and probability distributions for milestones such as 'weakly general AI,' 'general AI,' and passing the Turing Test, allowing users to download underlying data.

Good Judgment Inc. is the commercial spinoff of Philip Tetlock's landmark forecasting research, which demonstrated that a select group of 'superforecasters' can consistently outperform intelligence analysts and expert predictions using rigorous probabilistic thinking. The platform aggregates expert forecasts on geopolitical, technological, and scientific questions. It is highly relevant to AI safety for evaluating AI capabilities timelines and risk assessments.

★★★☆☆

OpenAI is a leading AI research and deployment company focused on building advanced AI systems, including GPT and o-series models, with a stated mission of ensuring artificial general intelligence (AGI) benefits all of humanity. The homepage serves as a gateway to their research, products, and policy work spanning capabilities and safety.

★★★★☆

Anthropic is an AI safety company focused on building reliable, interpretable, and steerable AI systems. The company conducts frontier AI research and develops Claude, its family of AI assistants, with a stated mission of responsible development and maintenance of advanced AI for long-term human benefit.

★★★★☆

80,000 Hours is a nonprofit that provides research and advice on how to use your career to have the most positive impact on the world's most pressing problems, with significant focus on AI safety and existential risk. They offer career guides, job boards, and in-depth research on high-priority cause areas and career paths. Their methodology emphasizes earning to give, direct work in high-impact fields, and building career capital.

★★★☆☆
10Coefficient GivingCoefficient Giving

Coefficient Giving is a philanthropic platform focused on directing funding toward high-impact AI safety and existential risk reduction efforts. It aims to help donors identify and support the most effective organizations working on preventing catastrophic AI outcomes. The platform provides guidance and resources for individuals seeking to contribute financially to the AI safety ecosystem.

★★★★☆
1180,000 Hours AGI Timelines Review80,000 Hours·Benjamin Todd·2025

A comprehensive synthesis by 80,000 Hours reviewing expert predictions on AGI timelines from multiple groups including AI lab leaders, researchers, and forecasters. The review finds a notable convergence toward shorter timelines, with many estimates suggesting AGI could arrive before 2030. Different expert communities that previously disagreed are now showing increasingly similar estimates.

★★★☆☆

Related Wiki Pages

Top Related Pages

Approaches

AI-Augmented ForecastingPrediction Markets (AI Forecasting)

Analysis

AI Forecasting Benchmark TournamentAI Risk Activation Timeline ModelCapability-Alignment Race ModelXPT (Existential Risk Persuasion Tournament)

Other

Anthony AguirreMax TegmarkEli Lifland

Concepts

AI TimelinesLong-Timelines Technical WorldviewEpistemic Orgs OverviewNovel / Unknown Approaches

Organizations

Epoch AIForecasting Research Institute (FRI)Bridgewater AIA LabsRethink PrioritiesFutureSearch

Key Debates

The Case For AI Existential RiskIs AI Existential Risk Real?