Skip to content
Longterm Wiki
Index
Fact·f_tHAA1W30dw·Fact

xAI — Benchmark Score: 93.3

Verdictconfirmed95%
1 check · 4/30/2026

1 → confirmed

Our claim

entire record
Subject
xAI
Property
Benchmark Score
Value
93.3
As Of
2025
Notes
Grok 3 AIME 2025 benchmark: 93.3% success rate

Source evidence

1 src · 1 check
confirmed95%primaryHaiku 4.5 · 4/30/2026

NoteThe source directly confirms the claim. The xAI announcement states: 'We tested these models on the 2025 American Invitational Mathematics Examination (AIME), which was released just 7 days ago on Feb 12th. With our highest level of test-time compute (cons@64), Grok 3 (Think) achieved 93.3% on this competition.' The benchmark score of 93.3% for Grok 3 on AIME 2025 is explicitly stated in the source text, matching the claim exactly.

Case № f_tHAA1W30dwFiled 4/30/2026Confidence 95%