Skip to content
Longterm Wiki

Grok on HumanEval: 86.5

benchmark-resultVerified

Child of HumanEval

Metadata

Source Tablebenchmark_results
Source IDBDgqDpG3vh
ParentHumanEval
Children
CreatedApr 24, 2026, 7:13 PM
UpdatedApr 24, 2026, 7:13 PM
SyncedApr 24, 2026, 7:13 PM

Record Data

idBDgqDpG3vh
benchmarkIdvxX2rorgxU
modelIdGrok(ai-model)
score86.5
unitpercent
date2025-02-19
sourceUrl
notesGrok 3 - Code generation from Python function docstrings with unit tests
testedByunknown
testedByOrgId
evaluationDate
methodologyNotes

Source Check Verdicts

confirmed95% confidence

Last checked: 4/24/2026

Inline sourcing: confirmed

Debug info

Thing ID: BDgqDpG3vh

Source Table: benchmark_results

Source ID: BDgqDpG3vh

Parent Thing ID: vxX2rorgxU