SimpleQA
KnowledgeA factual question-answering benchmark from OpenAI testing short, fact-seeking questions with verifiable answers. Evaluates factual accuracy and calibration.
Models Tested
3
Best Score
52.9%
Median Score
47.6%
Scoring: accuracy
Introduced: 2024-10
Maintainer: OpenAI
Leaderboard3 models
| # | Model | Developer | Score |
|---|---|---|---|
| 🥇 | Gemini 2.5 Pro | Google DeepMind | 52.9% |
| 🥈 | o3 | OpenAI | 47.6% |
| 🥉 | GPT-4.1 | OpenAI | 41.5% |