Skip to content
Longterm Wiki
Back

ARC Prize - Leaderboard

web

The ARC Prize benchmark, created by François Chollet, is widely cited in AI safety and capabilities discussions as a meaningful test of general reasoning that is difficult to solve via brute-force scaling, making it relevant for tracking genuine AGI progress.

Metadata

Importance: 55/100tool pagereference

Summary

The ARC Prize leaderboard tracks AI system performance on the Abstraction and Reasoning Corpus (ARC-AGI) benchmark, a test designed to measure general fluid intelligence and reasoning capabilities that current AI systems struggle with. It provides a public ranking of models and approaches attempting to solve ARC tasks, serving as a key benchmark for measuring progress toward human-level abstract reasoning.

Key Points

  • Tracks competitive performance on ARC-AGI, a benchmark specifically designed to resist memorization and test genuine reasoning/generalization
  • ARC tasks require understanding abstract patterns from few examples, making it a proxy for measuring general intelligence rather than narrow skill
  • Leaderboard highlights the gap between current AI capabilities and human-level performance on novel reasoning tasks
  • Serves as a reference point for evaluating whether AI systems are making genuine progress on abstract reasoning vs. benchmark gaming
  • Competition incentivizes novel approaches to program synthesis, inductive reasoning, and general problem-solving

Cited by 1 page

PageTypeQuality
Reasoning and PlanningCapability65.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202623 KB
# ARC-AGI-1 Leaderboard

$1e-3$0.01$0.10$1$10$1000%10%20%30%40%50%60%70%80%90%100%ARChitectsClaude 3.7 (1K)Claude 3.7 (8K)GPT-4.5GPT-4oIcecuberDeepseek R1Llama 4 MaverickLlama 4 ScoutGPT-4.1-NanoGPT-4.1-MiniGPT-4.1o1-minio3-mini (Low)o3-mini (Medium)o3-mini (High)o3 (Low)o3 (Medium)o3 (High)o4-mini (Low)o4-mini (Medium)o4-mini (High)Gemini 2.5 Flash (Preview)Gemini 2.5 Flash (Preview) (Thinking 1K)Gemini 2.5 Flash (Preview) (Thinking 8K)Gemini 2.5 Flash (Preview) (Thinking 16K)Gemini 2.5 Flash (Preview) (Thinking 24K)Codex Mini (Latest)Claude Sonnet 4Claude Sonnet 4 (Thinking 1K)Claude Sonnet 4 (Thinking 8K)Claude Sonnet 4 (Thinking 16K)Claude Opus 4 (Thinking 16K)Claude Opus 4 (Thinking 8K)Claude Opus 4Deepseek R1 (05/28)Grok 3Grok 3 Mini (Low)o3-Pro (Low)o3-Pro (Medium)o3-Pro (High)Magistral SmallMagistral MediumMagistral Medium (Thinking)Gemini 2.5 Pro (Thinking 1K)Gemini 2.5 Pro (Thinking 8K)Gemini 2.5 Pro (Thinking 16K)Gemini 2.5 Pro (Thinking 32K)Grok 4 (Thinking)Qwen3-235b-a22b Instruct (25/07)GPT-5 (High)GPT-5 (Medium)GPT-5 (Low)GPT-5 (Minimal)GPT-5 Mini (High)GPT-5 Mini (Medium)GPT-5 Mini (Low)GPT-5 Mini (Minimal)GPT-5 Nano (High)GPT-5 Nano (Medium)GPT-5 Nano (Low)GPT-5 Nano (Minimal)Grok 4 (Refine.)Claude Sonnet 4.5Claude Sonnet 4.5 (Thinking 1K)Claude Sonnet 4.5 (Thinking 8K)GPT-5 ProTiny Recursion Model (TRM)Claude Haiku 4.5Claude Haiku 4.5 (Thinking 1K)Claude Haiku 4.5 (Thinking 8K)Grok 4 (Fast Reasoning)Grok 4.20 (Reasoning)GPT-5.1 (Thinking, None)GPT-5.1 (Thinking, Low)GPT-5.1 (Thinking, Medium)GPT-5.1 (Thinking, High)Gemini 3 ProGemini 3 Deep Think (Preview) ²Opus 4.5 (Thinking, None)Opus 4.5 (Thinking, 8K)Opus 4.5 (Thinking, 16K)Opus 4.5 (Thinking, 32K)Opus 4.5 (Thinking, 64K)GPT-5.2GPT-5.2 (Low)GPT-5.2 (Medium)GPT-5.2 (High)GPT-5.2 (X-High)GPT-5.2 Pro (Medium)GPT-5.2 Pro (High)GPT-5.2 Pro (X-High)Gemini 3 Flash Preview (Minimal)Gemini 3 Flash Preview (Low)Gemini 3 Flash Preview (Medium)Gemini 3 Flash Preview (High)GPT-5.2 (Refine.)Claude Opus 4.6 (120K, Low)Claude Opus 4.6 (120K, Medium)Claude Opus 4.6 (120K, High)Claude Opus 4.6 (120K, Max)Gemini 3 Deep Think (2/26)Claude Sonnet 4.6 (Max)Claude Sonnet 4.6 (High)Gemini 3.1 Pro (Preview)Kimi K2.5Minimax M2.5Deepseek V3.2GLM-5GPT-5.4 (Low)GPT-5.4 (Medium)GPT-5.4 (High)GPT-5.4 (xHigh)GPT-5.4 Pro (xHigh)GPT-5.4 Mini (xHigh)GPT-5.4 Mini (High)GPT-5.4 Mini (Medium)GPT-5.4 Mini (Low)GPT-5.4 Nano (xHigh)GPT-5.4 Nano (High)GPT-5.4 Nano (Medium)GPT-5.4 Nano (Low)COST PER TASK ($)SCORE (%)03/19/2026

ARC-AGI-1

ARC-AGI-2

Author:

All Authors

All Authors

ARC Prize 2024only

ARC Prize 2025only

Alibabaonly

Anthropiconly

Bespokeonly

Deepseekonly

E. Pangonly

Googleonly

J. Bermanonly

Johan Landonly

Metaonly

Minimaxonly

Mistralonly

Moonshot AIonly

OpenAIonly

Poetiqonly

Z.aionly

xAIonly

Model type:

All Types

All Types

only

Base LLMonly

CoTonly

Customonly

Refinementonly

Model:

All Models

All Models

Qwen3-235b-a22b Instruct (25/07)only

Claude 3.7only

Claude 3.7 (16K)

... (truncated, 23 KB total)
Resource ID: a27f2ad202a2b5a7 | Stable ID: OGU5MzBmOD