Back
ARC Prize - Leaderboard
webarcprize.org·arcprize.org/leaderboard
The ARC Prize benchmark, created by François Chollet, is widely cited in AI safety and capabilities discussions as a meaningful test of general reasoning that is difficult to solve via brute-force scaling, making it relevant for tracking genuine AGI progress.
Metadata
Importance: 55/100tool pagereference
Summary
The ARC Prize leaderboard tracks AI system performance on the Abstraction and Reasoning Corpus (ARC-AGI) benchmark, a test designed to measure general fluid intelligence and reasoning capabilities that current AI systems struggle with. It provides a public ranking of models and approaches attempting to solve ARC tasks, serving as a key benchmark for measuring progress toward human-level abstract reasoning.
Key Points
- •Tracks competitive performance on ARC-AGI, a benchmark specifically designed to resist memorization and test genuine reasoning/generalization
- •ARC tasks require understanding abstract patterns from few examples, making it a proxy for measuring general intelligence rather than narrow skill
- •Leaderboard highlights the gap between current AI capabilities and human-level performance on novel reasoning tasks
- •Serves as a reference point for evaluating whether AI systems are making genuine progress on abstract reasoning vs. benchmark gaming
- •Competition incentivizes novel approaches to program synthesis, inductive reasoning, and general problem-solving
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Reasoning and Planning | Capability | 65.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202623 KB
# ARC-AGI-1 Leaderboard
$1e-3$0.01$0.10$1$10$1000%10%20%30%40%50%60%70%80%90%100%ARChitectsClaude 3.7 (1K)Claude 3.7 (8K)GPT-4.5GPT-4oIcecuberDeepseek R1Llama 4 MaverickLlama 4 ScoutGPT-4.1-NanoGPT-4.1-MiniGPT-4.1o1-minio3-mini (Low)o3-mini (Medium)o3-mini (High)o3 (Low)o3 (Medium)o3 (High)o4-mini (Low)o4-mini (Medium)o4-mini (High)Gemini 2.5 Flash (Preview)Gemini 2.5 Flash (Preview) (Thinking 1K)Gemini 2.5 Flash (Preview) (Thinking 8K)Gemini 2.5 Flash (Preview) (Thinking 16K)Gemini 2.5 Flash (Preview) (Thinking 24K)Codex Mini (Latest)Claude Sonnet 4Claude Sonnet 4 (Thinking 1K)Claude Sonnet 4 (Thinking 8K)Claude Sonnet 4 (Thinking 16K)Claude Opus 4 (Thinking 16K)Claude Opus 4 (Thinking 8K)Claude Opus 4Deepseek R1 (05/28)Grok 3Grok 3 Mini (Low)o3-Pro (Low)o3-Pro (Medium)o3-Pro (High)Magistral SmallMagistral MediumMagistral Medium (Thinking)Gemini 2.5 Pro (Thinking 1K)Gemini 2.5 Pro (Thinking 8K)Gemini 2.5 Pro (Thinking 16K)Gemini 2.5 Pro (Thinking 32K)Grok 4 (Thinking)Qwen3-235b-a22b Instruct (25/07)GPT-5 (High)GPT-5 (Medium)GPT-5 (Low)GPT-5 (Minimal)GPT-5 Mini (High)GPT-5 Mini (Medium)GPT-5 Mini (Low)GPT-5 Mini (Minimal)GPT-5 Nano (High)GPT-5 Nano (Medium)GPT-5 Nano (Low)GPT-5 Nano (Minimal)Grok 4 (Refine.)Claude Sonnet 4.5Claude Sonnet 4.5 (Thinking 1K)Claude Sonnet 4.5 (Thinking 8K)GPT-5 ProTiny Recursion Model (TRM)Claude Haiku 4.5Claude Haiku 4.5 (Thinking 1K)Claude Haiku 4.5 (Thinking 8K)Grok 4 (Fast Reasoning)Grok 4.20 (Reasoning)GPT-5.1 (Thinking, None)GPT-5.1 (Thinking, Low)GPT-5.1 (Thinking, Medium)GPT-5.1 (Thinking, High)Gemini 3 ProGemini 3 Deep Think (Preview) ²Opus 4.5 (Thinking, None)Opus 4.5 (Thinking, 8K)Opus 4.5 (Thinking, 16K)Opus 4.5 (Thinking, 32K)Opus 4.5 (Thinking, 64K)GPT-5.2GPT-5.2 (Low)GPT-5.2 (Medium)GPT-5.2 (High)GPT-5.2 (X-High)GPT-5.2 Pro (Medium)GPT-5.2 Pro (High)GPT-5.2 Pro (X-High)Gemini 3 Flash Preview (Minimal)Gemini 3 Flash Preview (Low)Gemini 3 Flash Preview (Medium)Gemini 3 Flash Preview (High)GPT-5.2 (Refine.)Claude Opus 4.6 (120K, Low)Claude Opus 4.6 (120K, Medium)Claude Opus 4.6 (120K, High)Claude Opus 4.6 (120K, Max)Gemini 3 Deep Think (2/26)Claude Sonnet 4.6 (Max)Claude Sonnet 4.6 (High)Gemini 3.1 Pro (Preview)Kimi K2.5Minimax M2.5Deepseek V3.2GLM-5GPT-5.4 (Low)GPT-5.4 (Medium)GPT-5.4 (High)GPT-5.4 (xHigh)GPT-5.4 Pro (xHigh)GPT-5.4 Mini (xHigh)GPT-5.4 Mini (High)GPT-5.4 Mini (Medium)GPT-5.4 Mini (Low)GPT-5.4 Nano (xHigh)GPT-5.4 Nano (High)GPT-5.4 Nano (Medium)GPT-5.4 Nano (Low)COST PER TASK ($)SCORE (%)03/19/2026
ARC-AGI-1
ARC-AGI-2
Author:
All Authors
All Authors
ARC Prize 2024only
ARC Prize 2025only
Alibabaonly
Anthropiconly
Bespokeonly
Deepseekonly
E. Pangonly
Googleonly
J. Bermanonly
Johan Landonly
Metaonly
Minimaxonly
Mistralonly
Moonshot AIonly
OpenAIonly
Poetiqonly
Z.aionly
xAIonly
Model type:
All Types
All Types
only
Base LLMonly
CoTonly
Customonly
Refinementonly
Model:
All Models
All Models
Qwen3-235b-a22b Instruct (25/07)only
Claude 3.7only
Claude 3.7 (16K)
... (truncated, 23 KB total)Resource ID:
a27f2ad202a2b5a7 | Stable ID: OGU5MzBmOD