MGSM
MathMultilingual Grade School Math — a benchmark of 250 grade-school math problems translated into 10 typologically diverse languages. Tests multilingual mathematical reasoning.
Models Tested
7
Best Score
92.4%
Median Score
91.6%
Scoring: accuracy
Introduced: 2022-10
Maintainer: Google Research
Leaderboard7 models
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Claude 3.7 Sonnet | Anthropic | 92.4% |
| 🥈 | Gemini 2.5 Pro | Google DeepMind | 92.2% |
| 🥉 | Claude 3.5 Sonnet | Anthropic | 91.6% |
| 4 | Llama 3.1 | Meta AI (FAIR) | 91.6% |
| 5 | GPT-4o | OpenAI | 90.5% |
| 6 | Claude 3.5 Haiku | Anthropic | 85.6% |
| 7 | Gemini 1.5 Flash | Google DeepMind | 82.6% |