Models Tested
4
Best Score
52.9%
Median Score
44.55%
Scoring: accuracy
Introduced: 2024-10
Maintainer: OpenAI
Leaderboard (4 models)
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Gemini 2.5 Pro | Google DeepMind | 52.9% |
| 🥈 | o3 | OpenAI | 47.6% |
| 🥉 | GPT-4.1 | OpenAI | 41.5% |
| 4 | Claude Opus 4.5 | Anthropic | 36 |