Skip to content
Longterm Wiki

GSM8K

Math

Grade School Math 8K — a dataset of 8,500 linguistically diverse grade-school math word problems requiring multi-step reasoning with basic arithmetic operations.

Models Tested
2
Best Score
96.8%
Median Score
93.05%
Scoring: accuracy
Introduced: 2021-10
Maintainer: OpenAI

Leaderboard2 models

#ModelDeveloperScore
🥇Llama 3.1Meta AI (FAIR)
96.8%
🥈DeepSeek V3DeepSeek
89.3%