Models Tested
12
Best Score
79.4
Median Score
65.65
Scoring: pass_at_1
Introduced: 2024-06
Maintainer: LiveCodeBench Team
Leaderboard (12 models)
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Grok | xAI | 79.4 |
| 🥈 | Grok-3 | xAI | 79.4% |
| 🥉 | o3 | OpenAI | 71.7% |
| 4 | Claude Opus 4.5 | Anthropic | 70.3 |
| 5 | o4-mini | OpenAI | 67.8% |
| 6 | DeepSeek R1 | DeepSeek | 65.9% |
| 7 | Claude 3.7 Sonnet | Anthropic | 65.4 |
| 8 | Gemini 2.5 Pro | Google DeepMind | 63.4% |
| 9 | o3-mini | OpenAI | 57.6% |
| 10 | Llama 4 Maverick | Meta AI (FAIR) | 43.4% |
| 11 | DeepSeek V3 | DeepSeek | 40.5% |
| 12 | Llama 4 Scout | Meta AI (FAIR) | 32.8% |