All Source Checks
Automated source checking of wiki data against original sources. Each record is checked against one or more external sources to confirm accuracy.
View internal dashboard with coverage & action queue →Verified Correct
65
97% of checked
Has Issues
0
0% of checked
Can't Verify
2
3% of checked
Not Yet Checked
0
of 67 total
Contradicted
0
None found
Outdated
0
All current
Accuracy Rate
100%
confirmed / (confirmed + wrong + outdated)
Needs Recheck
0
All up to date
sid_bFjrDfX8rQ / GSM8K: 57.1
sid_bFjrDfX8rQ / DROP: 61.4
sid_bFjrDfX8rQ / HellaSwag: 85.5
sid_bFjrDfX8rQ / TruthfulQA: 47
sid_bFjrDfX8rQ / WinoGrande: 81.6
sid_oSG59ppF7g / MMLU: 80.1
sid_oSG59ppF7g / Aider Polyglot: 9.8
sid_kWPQCvjKSg / MMLU: 87.3
sid_kWPQCvjKSg / HumanEval: 89
sid_kWPQCvjKSg / MATH: 73.8
sid_tppPAkJqjQ / GSM8K: 95
sid_tppPAkJqjQ / HumanEval: 92
sid_tppPAkJqjQ / MMLU-Pro: 89.5
sid_tppPAkJqjQ / SimpleQA: 36
sid_tppPAkJqjQ / LiveCodeBench: 70.3
sid_tppPAkJqjQ / MGSM: 92.5
sid_tppPAkJqjQ / Humanity's Last Exam: 43.2
sid_Ac7c55KtVw / MMLU: 92.1
sid_Ac7c55KtVw / HumanEval: 95.4
sid_Ac7c55KtVw / BrowseComp: 84
sid_Ac7c55KtVw / MMMU: 76.5
sid_Ac7c55KtVw / GSM8K: 98.4
sid_Ac7c55KtVw / IFEval: 91.2
sid_ePVee3jidQ / MMMU: 69.1
sid_ePVee3jidQ / LiveCodeBench: 65.4
sid_ePVee3jidQ / GSM8K: 96.4
sid_ePVee3jidQ / MMLU-Pro: 78.4
sid_ePVee3jidQ / HumanEval: 94
sid_ISfAiImMYg / SWE-bench Verified: 49
sid_ISfAiImMYg / GSM8K: 96.4
sid_v1e1ZwDwoA / HumanEval: 30.5
sid_v1e1ZwDwoA / GSM8K: 40.3
sid_v1e1ZwDwoA / HellaSwag: 84
sid_v1e1ZwDwoA / MMLU: 60.1
sid_nnv09Wl5OQ / LiveCodeBench: 79.4
sid_nnv09Wl5OQ / Chatbot Arena Elo: 1402
sid_nnv09Wl5OQ / HumanEval: 86.5
sid_nnv09Wl5OQ / GSM8K: 89.3
sid_nnv09Wl5OQ / MMLU-Pro: 79.9
sid_nywmt9QdsA / MMLU: 80.1
sid_Gqv7h9oEwA / HellaSwag: 95
sid_Gqv7h9oEwA / GSM8K: 92
sid_Gqv7h9oEwA / MATH: 76.6
sid_Gqv7h9oEwA / MGSM: 90.5
sid_Gqv7h9oEwA / HumanEval: 90.2
sid_Gqv7h9oEwA / MMLU: 88.7
sid_PaKhQQNPkg / MATH: 78.3
sid_PaKhQQNPkg / HumanEval: 89.7
sid_PaKhQQNPkg / MMLU: 92.4
sid_PaKhQQNPkg / Humanity's Last Exam: 44.4
| Type | Entity | Claim | Verdict | Confidence | Sources | Last Checked | |
|---|---|---|---|---|---|---|---|
| Benchmark Result | - | sid_bFjrDfX8rQ / GSM8K: 57.1 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_bFjrDfX8rQ / DROP: 61.4 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_bFjrDfX8rQ / HellaSwag: 85.5 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_bFjrDfX8rQ / TruthfulQA: 47 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_bFjrDfX8rQ / WinoGrande: 81.6 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_oSG59ppF7g / MMLU: 80.1 | confirmed | 98% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_oSG59ppF7g / Aider Polyglot: 9.8 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_kWPQCvjKSg / MMLU: 87.3 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_kWPQCvjKSg / HumanEval: 89 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_kWPQCvjKSg / MATH: 73.8 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_tppPAkJqjQ / GSM8K: 95 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_tppPAkJqjQ / HumanEval: 92 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_tppPAkJqjQ / MMLU-Pro: 89.5 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_tppPAkJqjQ / SimpleQA: 36 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_tppPAkJqjQ / LiveCodeBench: 70.3 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_tppPAkJqjQ / MGSM: 92.5 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_tppPAkJqjQ / Humanity's Last Exam: 43.2 | confirmed | 98% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_Ac7c55KtVw / MMLU: 92.1 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_Ac7c55KtVw / HumanEval: 95.4 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_Ac7c55KtVw / BrowseComp: 84 | confirmed | 95% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_Ac7c55KtVw / MMMU: 76.5 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_Ac7c55KtVw / GSM8K: 98.4 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | - | sid_Ac7c55KtVw / IFEval: 91.2 | confirmed | 99% | 1 | Apr 29, 2026 | |
| Benchmark Result | Claude 3.7 Sonnet | sid_ePVee3jidQ / MMMU: 69.1 | confirmed | 99% | 1 | Apr 24, 2026 | |
| Benchmark Result | Claude 3.7 Sonnet | sid_ePVee3jidQ / LiveCodeBench: 65.4 | confirmed | 99% | 1 | Apr 24, 2026 | |
| Benchmark Result | Claude 3.7 Sonnet | sid_ePVee3jidQ / GSM8K: 96.4 | confirmed | 99% | 1 | Apr 24, 2026 | |
| Benchmark Result | Claude 3.7 Sonnet | sid_ePVee3jidQ / MMLU-Pro: 78.4 | confirmed | 99% | 1 | Apr 24, 2026 | |
| Benchmark Result | Claude 3.7 Sonnet | sid_ePVee3jidQ / HumanEval: 94 | confirmed | 99% | 1 | Apr 24, 2026 | |
| Benchmark Result | Claude 3.5 Sonnet | sid_ISfAiImMYg / SWE-bench Verified: 49 | confirmed | 98% | 1 | Apr 24, 2026 | |
| Benchmark Result | Claude 3.5 Sonnet | sid_ISfAiImMYg / GSM8K: 96.4 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Mistral | sid_v1e1ZwDwoA / HumanEval: 30.5 | confirmed | 98% | 1 | Apr 24, 2026 | |
| Benchmark Result | Mistral | sid_v1e1ZwDwoA / GSM8K: 40.3 | confirmed | 98% | 1 | Apr 24, 2026 | |
| Benchmark Result | Mistral | sid_v1e1ZwDwoA / HellaSwag: 84 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Mistral | sid_v1e1ZwDwoA / MMLU: 60.1 | confirmed | 98% | 1 | Apr 24, 2026 | |
| Benchmark Result | Grok | sid_nnv09Wl5OQ / LiveCodeBench: 79.4 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Grok | sid_nnv09Wl5OQ / Chatbot Arena Elo: 1402 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Grok | sid_nnv09Wl5OQ / HumanEval: 86.5 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Grok | sid_nnv09Wl5OQ / GSM8K: 89.3 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Grok | sid_nnv09Wl5OQ / MMLU-Pro: 79.9 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | GPT-4.1 mini | sid_nywmt9QdsA / MMLU: 80.1 | confirmed | 98% | 1 | Apr 24, 2026 | |
| Benchmark Result | GPT | sid_Gqv7h9oEwA / HellaSwag: 95 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | GPT | sid_Gqv7h9oEwA / GSM8K: 92 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | GPT | sid_Gqv7h9oEwA / MATH: 76.6 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | GPT | sid_Gqv7h9oEwA / MGSM: 90.5 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | GPT | sid_Gqv7h9oEwA / HumanEval: 90.2 | confirmed | 99% | 1 | Apr 24, 2026 | |
| Benchmark Result | GPT | sid_Gqv7h9oEwA / MMLU: 88.7 | confirmed | 98% | 1 | Apr 24, 2026 | |
| Benchmark Result | Gemini | sid_PaKhQQNPkg / MATH: 78.3 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Gemini | sid_PaKhQQNPkg / HumanEval: 89.7 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Gemini | sid_PaKhQQNPkg / MMLU: 92.4 | confirmed | 95% | 1 | Apr 24, 2026 | |
| Benchmark Result | Gemini | sid_PaKhQQNPkg / Humanity's Last Exam: 44.4 | confirmed | 99% | 1 | Apr 24, 2026 |
Data from source_check_verdicts table. Click a row to view detailed evidence.