Tools & Platforms
Tools & Platforms
Software platforms and tools designed to support forecasting, prediction, knowledge coordination, and decision-making under uncertainty in the epistemic infrastructure ecosystem.
Overview
Epistemic tools are software platforms and systems designed to improve individual and collective reasoning, particularly for decision-making under uncertainty. These tools implement various mechanisms—from probabilistic programming languages to crowdsourced verification systems—that aim to make beliefs more explicit, quantifiable, and subject to empirical testing.1
The tools documented here span several functional categories: forecasting and prediction platforms that elicit and aggregate probabilistic judgments; knowledge coordination systems that organize information for specific research communities; benchmarking frameworks that evaluate forecasting capabilities; and verification systems that assess claim accuracy at scale. While these tools are used across multiple domains, this overview focuses on platforms with significant adoption or relevance within the AI safety and existential risk research communities.
The effectiveness of epistemic tools varies by context and implementation. Prediction marketsApproachPrediction Markets (AI Forecasting)Prediction markets achieve Brier scores of 0.16-0.24 (15-25% better than polls) by aggregating dispersed information through financial incentives, with platforms handling $1-3B annually. For AI saf...Quality: 56/100 and structured forecasting platforms have demonstrated moderate improvements in accuracy over individual judgment in some domains,2 though challenges remain around liquidity, participation incentives, and the selection of questions amenable to these methods. Knowledge coordination tools address different problems—organizing research literature, tracking expert consensus, and reducing information asymmetries—with effectiveness that is harder to quantify systematically.
Tool Categories
Forecasting & Prediction
Platforms and languages for generating, expressing, and aggregating probabilistic forecasts:
- SquiggleProjectSquiggleSquiggle is a domain-specific probabilistic programming language optimized for intuition-driven estimation rather than data-driven inference, developed by QURI and adopted primarily in the EA commu...Quality: 41/100: Programming language for expressing uncertainty using probability distributions, enabling Monte Carlo simulation and sensitivity analysis for quantitative models
- SquiggleAIProjectSquiggleAISquiggleAI is an LLM tool (primarily Claude Sonnet 4.5) that generates probabilistic Squiggle models from natural language, using ~20K tokens of cached documentation to produce 100-500 line models ...Quality: 37/100: Experimental system that uses large language models to generate Squiggle code from natural language descriptions of uncertain quantities
- MetaforecastProjectMetaforecastMetaforecast is a forecast aggregation platform combining 2,100+ questions from 10+ sources (Metaculus, Manifold, Polymarket, etc.) with daily updates via automated scraping. Created by QURI, it pr...Quality: 35/100: Aggregator that collected forecasts from multiple prediction platforms (Metaculus, Manifold Markets, Polymarket, and others) into a searchable database; development ceased in 20233
Additional major forecasting platforms not yet documented in this wiki include Metaculus (a free forecasting tournament platform with 100,000+ registered users), Manifold Markets (a play-money prediction market), and Good Judgment Open (run by the research group that outperformed intelligence analysts in IARPA's forecasting tournaments).4
Benchmarking & Evaluation
Systems for measuring forecasting accuracy and comparing human and AI performance:
- ForecastBenchProjectForecastBenchForecastBench is a dynamic, contamination-free benchmark with 1,000 continuously-updated questions comparing LLM forecasting to superforecasters. GPT-4.5 achieves 0.101 Brier score vs 0.081 for sup...Quality: 53/100: Benchmark dataset designed to evaluate AI systems' forecasting capabilities on questions with verifiable resolutions
- AI Forecasting Benchmark TournamentProjectAI Forecasting Benchmark TournamentQuarterly competition (Q2 2025: 348 questions, 54 bot-makers, $30K prizes) comparing human Pro Forecasters against AI bots, with statistical testing showing humans maintain significant lead (p=0.00...Quality: 41/100: Competition format comparing human forecasters against AI systems on the same question set
Research Coordination
Platforms that structure expert collaboration on research questions:
- XPTProjectXPT (Existential Risk Persuasion Tournament)A 2022 forecasting tournament with 169 participants found superforecasters severely underestimated AI progress (2.3% probability for IMO gold vs actual 2025 achievement) and gave 8x lower AI extinc...Quality: 54/100: Framework for adversarial collaboration where researchers with differing views jointly investigate questions about existential risk, documenting both agreements and remaining disagreements
Verification & Fact-Checking
Systems for assessing claim accuracy using crowdsourced evaluation:
- X Community NotesProjectX Community NotesCommunity Notes uses a bridging algorithm requiring cross-partisan consensus to display fact-checks, reducing retweets 25-50% when notes appear. However, only 8.3% of notes achieve visibility, taki...Quality: 54/100: Crowdsourced system that allows users to add context to posts on X (formerly Twitter), with notes displayed based on cross-partisan agreement rather than majority vote5
Knowledge Coordination
Wikis, databases, and knowledge management systems focused on specific research domains:
- Longterm WikiProjectLongterm WikiA self-referential documentation page describing the Longterm Wiki platform itself—a strategic intelligence tool with ~550 pages, crux mapping of ~50 uncertainties, and quality scoring across 6 dim...Quality: 63/100: Wiki focused on AI safety research prioritization and strategic questions about transformative AI development
- Stampy / AISafety.infoProjectStampy / AISafety.infoAISafety.info is a volunteer-maintained wiki with 280+ answers on AI existential risk, complemented by Stampy, an LLM chatbot searching 10K-100K alignment documents via RAG. Features include a Disc...Quality: 45/100: Question-and-answer database about AI safety topics, integrated with a large language model chatbot interface
- MIT AI Risk RepositoryProjectMIT AI Risk RepositoryThe MIT AI Risk Repository catalogs 1,700+ AI risks from 65+ frameworks into a searchable database with dual taxonomies (causal and domain-based). Updated quarterly since August 2024, it provides t...Quality: 40/100: Structured database of 1,700+ AI risk scenarios and failure modes, with standardized categorization
Tracking & Documentation Infrastructure
Databases and websites that systematically track organizations, people, funding, and events in the AI safety and EA ecosystems:
- AI WatchProjectAI WatchAI Watch is a tracking database by Issa Rice that monitors AI safety organizations, people, funding, and publications as part of his broader knowledge infrastructure ecosystem. The article provides...Quality: 23/100: Database by Issa Rice tracking AI safety organizations, people, funding, and publications
- Org WatchProjectOrg WatchOrg Watch is a tracking website by Issa Rice that monitors EA and AI safety organizations, but the article lacks concrete information about its actual features, scope, or current status. The piece ...Quality: 23/100: Tracking website by Issa Rice monitoring EA and AI safety organizations
- Timelines WikiProjectTimelines WikiTimelines Wiki is a specialized MediaWiki project documenting chronological histories of AI safety and EA organizations, created by Issa Rice with funding from Vipul Naik in 2017. While useful as a...Quality: 45/100: MediaWiki project documenting chronological histories of AI safety and EA organizations
- Donations List WebsiteProjectDonations List WebsiteComprehensive documentation of an open-source database tracking $72.8B in philanthropic donations (1969-2023) across 75+ donors, with particular coverage of EA/AI safety funding. The page thoroughl...Quality: 52/100: Open-source database tracking $72.8B in philanthropic donations (1969-2023) across 75+ donors
- Wikipedia ViewsProjectWikipedia ViewsThis article provides a comprehensive overview of Wikipedia pageview analytics tools and their declining traffic due to AI summaries reducing direct visits. While well-documented, it's primarily ab...Quality: 38/100: Analytics tools for tracking Wikipedia pageview data, useful for measuring public attention to AI safety topics
AI-Assisted Knowledge & Content Quality
Tools and systems that use large language models to support knowledge base creation, maintenance, and content quality:
- AI-Assisted Knowledge ManagementConceptAI-Assisted Knowledge ManagementTools and platforms that use LLMs to help organizations and individuals create, maintain, and query knowledge bases and wikis. The ecosystem spans personal tools (Obsidian+AI, Notion AI), public kn...: Category covering LLM integrations with note-taking and wiki software (Obsidian plugins, Notion AI, Golden, NotebookLM) and retrieval-augmented generation frameworks
- GrokipediaProjectGrokipediaxAI's AI-generated encyclopedia launched October 2025, growing from 800K to 6M+ articles in three months. Multiple independent reviews (Wired, NBC News, PolitiFact) documented right-leaning politic...: AI-generated encyclopedia created by xAI with 6+ million articles; notable as a case study in scaling AI-generated encyclopedic content and the resulting tradeoffs in quality control
- Wikipedia and AI ContentConceptWikipedia and AI ContentWikipedia's evolving relationship with AI-generated content, including defensive policies (G15 speedy deletion, disclosure requirements), WikiProject AI Cleanup (~5% of new articles found AI-genera...: Documentation of how Wikipedia addresses AI-generated content through policies, WikiProject AI Cleanup, and debates about maintaining editorial standards at scale
- RoastMyPostProjectRoastMyPostRoastMyPost is an LLM tool (Claude Sonnet 4.5 + Perplexity) that evaluates written content through multiple specialized AI agents—fact-checking, logical fallacy detection, math verification, and mo...Quality: 35/100: LLM-powered tool using multiple specialized AI agents for fact-checking, logical fallacy detection, and math verification of written content
Ecosystem Characteristics
These tools reflect different approaches to improving collective reasoning. Forecasting platforms attempt to harness wisdom-of-crowds effects by aggregating many individual predictions. Knowledge coordination systems address information organization and accessibility problems. Verification tools create incentive structures for accurate assessment.
Integration between these tools remains limited. Forecasts from platforms like Metaculus or Manifold Markets are not systematically fed into knowledge bases. Probabilistic models created in Squiggle are typically not connected to forecasting platforms for validation. The research and development of these tools occurs mostly independently, with different codebases, data formats, and user communities.
Most platforms in this space operate as non-profit projects or venture-backed companies with uncertain sustainability models. User bases remain relatively small and concentrated within specific communities interested in rationality, effective altruism, or AI safety. Mainstream adoption of quantified forecasting tools outside these communities has been limited, with notable exceptions like prediction markets focused on political or financial outcomes.
Footnotes
-
Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown Publishers. Documents research on forecasting accuracy and methods for improving judgment under uncertainty. ↩
-
Atanasov, P., et al. (2017). "Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls." Management Science, 63(3), 691-706. Meta-analysis finding prediction markets slightly outperform polls in some contexts, with effect sizes varying by domain. ↩
-
Metaforecast development status from project GitHub repository (last updated October 2023) and maintainer statements on LessWrong. ↩
-
User count for Metaculus from platform about page (accessed 2024); Good Judgment Open performance documented in Mellers, B., et al. (2014). "Psychological Strategies for Winning a Geopolitical Forecasting Tournament." Psychological Science, 25(5), 1106-1115. ↩
-
Community Notes algorithm description: Birdwatch/Community Notes Technical Whitepaper, X Corp (2024). Describes bridging-based ranking that prioritizes notes rated helpful by users who typically disagree. ↩