LiveCodeBench
CodingA contamination-free coding benchmark using new competitive programming problems from LeetCode, AtCoder, and Codeforces. Problems are refreshed continuously to prevent data leakage.
Models Tested
9
Best Score
79.4%
Median Score
63.4%
Scoring: pass_at_1
Introduced: 2024-06
Maintainer: LiveCodeBench Team
Leaderboard9 models
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Grok-3 | xAI | 79.4% |
| 🥈 | o3 | OpenAI | 71.7% |
| 🥉 | o4-mini | OpenAI | 67.8% |
| 4 | DeepSeek R1 | DeepSeek | 65.9% |
| 5 | Gemini 2.5 Pro | Google DeepMind | 63.4% |
| 6 | o3-mini | OpenAI | 57.6% |
| 7 | Llama 4 Maverick | Meta AI (FAIR) | 43.4% |
| 8 | DeepSeek V3 | DeepSeek | 40.5% |
| 9 | Llama 4 Scout | Meta AI (FAIR) | 32.8% |