Skip to content
Longterm Wiki
Back

HumanEval: Hand-Written Evaluation Set for Code Generation

web

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

HumanEval is widely used to benchmark code generation capabilities of LLMs; relevant to AI safety discussions around capability measurement, evaluation robustness, and tracking AI progress in software engineering tasks.

Metadata

Importance: 62/100dataset

Summary

HumanEval is OpenAI's open-source benchmark dataset for evaluating the functional correctness of code generated by language models. It consists of 164 hand-crafted Python programming problems with unit tests, used to measure how well AI systems can synthesize code from docstrings. It was introduced alongside the Codex paper and has become a standard benchmark in the field.

Key Points

  • Contains 164 original Python programming problems with function signatures, docstrings, and unit tests for automated evaluation
  • Measures functional correctness of generated code using a pass@k metric rather than syntactic similarity
  • Introduced with OpenAI's Codex model and has become an industry-standard benchmark for code generation capability
  • Open-source release enables reproducible comparisons across different code-generating AI models
  • Represents a capability evaluation tool relevant to tracking AI progress in code synthesis tasks

Cited by 1 page

PageTypeQuality
Minimal ScaffoldingCapability52.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202612 KB
[Skip to content](https://github.com/openai/human-eval#start-of-content)

You signed in with another tab or window. [Reload](https://github.com/openai/human-eval) to refresh your session.You signed out in another tab or window. [Reload](https://github.com/openai/human-eval) to refresh your session.You switched accounts on another tab or window. [Reload](https://github.com/openai/human-eval) to refresh your session.Dismiss alert

{{ message }}

[openai](https://github.com/openai)/ **[human-eval](https://github.com/openai/human-eval)** Public

- [Notifications](https://github.com/login?return_to=%2Fopenai%2Fhuman-eval) You must be signed in to change notification settings
- [Fork\\
443](https://github.com/login?return_to=%2Fopenai%2Fhuman-eval)
- [Star\\
3.2k](https://github.com/login?return_to=%2Fopenai%2Fhuman-eval)


master

[**2** Branches](https://github.com/openai/human-eval/branches) [**0** Tags](https://github.com/openai/human-eval/tags)

[Go to Branches page](https://github.com/openai/human-eval/branches)[Go to Tags page](https://github.com/openai/human-eval/tags)

Go to file

Code

Open more actions menu

## Folders and files

| Name | Name | Last commit message | Last commit date |
| --- | --- | --- | --- |
| ## Latest commit<br>[![mpokrass](https://avatars.githubusercontent.com/u/5784632?v=4&size=40)](https://github.com/mpokrass)[mpokrass](https://github.com/openai/human-eval/commits?author=mpokrass)<br>[Merge pull request](https://github.com/openai/human-eval/commit/6d43fb980f9fee3c892a914eda09951f772ad10d) [#54](https://github.com/openai/human-eval/pull/54) [from openai/michelle/fix-running](https://github.com/openai/human-eval/commit/6d43fb980f9fee3c892a914eda09951f772ad10d)<br>Open commit details<br>last yearJan 17, 2025<br>[6d43fb9](https://github.com/openai/human-eval/commit/6d43fb980f9fee3c892a914eda09951f772ad10d) · last yearJan 17, 2025<br>## History<br>[7 Commits](https://github.com/openai/human-eval/commits/master/) <br>Open commit details<br>[View commit history for this file.](https://github.com/openai/human-eval/commits/master/) 7 Commits |
| [data](https://github.com/openai/human-eval/tree/master/data "data") | [data](https://github.com/openai/human-eval/tree/master/data "data") | [squash commits](https://github.com/openai/human-eval/commit/463c980b59e818ace59f6f9803cd92c749ceae61 "squash commits") | 5 years agoJul 7, 2021 |
| [human\_eval](https://github.com/openai/human-eval/tree/master/human_eval "human_eval") | [human\_eval](https://github.com/openai/human-eval/tree/master/human_eval "human_eval") | [fix broken eval](https://github.com/openai/human-eval/commit/37c4dd63798c3c9ba32fa69a2fb49c5e2c43a181 "fix broken eval") | last yearJan 16, 2025 |
| [LICENSE](https://github.com/openai/human-eval/blob/master/LICENSE "LICENSE") | [LICENSE](https://github.com/openai/human-eval/blob/master/LICENSE "LICENSE") | [Add license file. (](https://github.com/openai/human-eval/commit/d321ec0b6c23dec317337be99f6d0c45ca73f3d5 "Add lice

... (truncated, 12 KB total)
Resource ID: 9edbbd4ae30cd1f8 | Stable ID: NzcxMzMzMz