Skip to content
Longterm Wiki

ARC-AGI-2

Reasoning
Second iteration of the ARC benchmark with harder tasks, designed to remain challenging as AI capabilities improve.
Models Tested
2
Best Score
77.1
Median Score
72.95
Scoring: accuracy
Introduced: 2025-01
Maintainer: Francois Chollet / ARC Prize Foundation

Leaderboard (2 models)

#ModelDeveloperScore
🥇GeminiGoogle DeepMind
77.1
🥈Claude Opus 4.6Anthropic
68.8%