OpenAI, "Why language models hallucinate" (https://openai.com/index/why-language-models-hallucinate/)
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
A September 2025 OpenAI blog post and associated arXiv paper (2509.04664) presenting a mechanistic account of why hallucinations persist and proposing evaluation and training reforms; relevant to calibration, honesty, and AI reliability research.
Metadata
Summary
OpenAI researchers argue that LLM hallucinations persist because standard training and evaluation procedures reward confident guessing over honest uncertainty, using multiple-choice test analogies to illustrate the misaligned incentives. They propose that evaluation methods should penalize errors more than abstentions, and that models should be trained to express calibrated uncertainty. The accompanying paper formalizes these ideas and demonstrates improvements on benchmarks like SimpleQA.
Key Points
- •Hallucinations arise partly because accuracy-only metrics reward guessing over abstention, creating incentives for overconfidence rather than honesty.
- •The paper distinguishes three response types—accurate, erroneous, and abstentions—arguing errors are strictly worse than admitting uncertainty.
- •Current evaluation scoreboards systematically favor models that guess, making them appear better than more epistemically honest but lower-scoring models.
- •OpenAI proposes reforming both evaluation metrics and training objectives to reward calibrated uncertainty and appropriate abstention.
- •Even GPT-5, which shows fewer hallucinations, still hallucinates, illustrating this as a fundamental unsolved challenge across all large language models.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Large Language Models | Capability | 60.0 |
| Reducing Hallucinations in AI-Generated Wiki Content | Approach | 68.0 |
Cached Content Preview
Switch to
- [ChatGPT(opens in a new window)](https://chatgpt.com/?openaicom-did=d98416fa-d67c-492b-83f1-50af5d21efa0&openaicom_referred=true)
- [Sora(opens in a new window)](https://sora.com/)
- [API Platform(opens in a new window)](https://platform.openai.com/)
Why language models hallucinate \| OpenAI
September 5, 2025
[Research](https://openai.com/news/research/) [Publication](https://openai.com/research/index/publication/)
# Why language models hallucinate
[Read the paper(opens in a new window)](https://arxiv.org/abs/2509.04664)

Listen to article
Share
At OpenAI, we’re working hard to make AI systems more useful and reliable. Even as language models become more capable, one challenge remains stubbornly hard to fully solve: hallucinations. By this we mean instances where a model confidently generates an answer that isn’t true. Our [new research paper(opens in a new window)](https://arxiv.org/abs/2509.04664) argues that language models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty.
ChatGPT also hallucinates. GPT‑5 has significantly fewer hallucinations [especially when reasoning](https://openai.com/index/introducing-gpt-5/#:~:text=Building%20a%20more%20robust%2C%20reliable%2C%20and%20helpful%20model), but they still occur. Hallucinations remain a fundamental challenge for all large language models, but we are working hard to further reduce them.
## What are hallucinations?
Hallucinations are plausible but false statements generated by language models. They can show up in surprising ways, even for seemingly straightforward questions. For example, when we asked a widely used chatbot for the title of the PhD dissertation by Adam Tauman Kalai (an author of this paper), it confidently produced three different answers—none of them correct. When we asked for his birthday, it gave three different dates, likewise all wrong.
## Teaching to the test
Hallucinations persist partly because current evaluation methods set the wrong incentives. While evaluations themselves do not directly cause hallucinations, most evaluations measure model performance in a way that encourages guessing rather than honesty about uncertainty.
Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”
As another example, suppose a language model is asked for someone’s birthday but doesn’t know. If it guesses “September 10,” it has a 1-in-365 chance of b
... (truncated, 11 KB total)1fa908f4a1b1af47 | Stable ID: MjZkMzY4MD