Skip to content
Longterm Wiki
Back

OpenAI, "Why language models hallucinate" (https://openai.com/index/why-language-models-hallucinate/)

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

A September 2025 OpenAI blog post and associated arXiv paper (2509.04664) presenting a mechanistic account of why hallucinations persist and proposing evaluation and training reforms; relevant to calibration, honesty, and AI reliability research.

Metadata

Importance: 62/100blog postprimary source

Summary

OpenAI researchers argue that LLM hallucinations persist because standard training and evaluation procedures reward confident guessing over honest uncertainty, using multiple-choice test analogies to illustrate the misaligned incentives. They propose that evaluation methods should penalize errors more than abstentions, and that models should be trained to express calibrated uncertainty. The accompanying paper formalizes these ideas and demonstrates improvements on benchmarks like SimpleQA.

Key Points

  • Hallucinations arise partly because accuracy-only metrics reward guessing over abstention, creating incentives for overconfidence rather than honesty.
  • The paper distinguishes three response types—accurate, erroneous, and abstentions—arguing errors are strictly worse than admitting uncertainty.
  • Current evaluation scoreboards systematically favor models that guess, making them appear better than more epistemically honest but lower-scoring models.
  • OpenAI proposes reforming both evaluation metrics and training objectives to reward calibrated uncertainty and appropriate abstention.
  • Even GPT-5, which shows fewer hallucinations, still hallucinates, illustrating this as a fundamental unsolved challenge across all large language models.

Cited by 2 pages

Cached Content Preview

HTTP 200Fetched Mar 15, 202611 KB
Switch to

- [ChatGPT(opens in a new window)](https://chatgpt.com/?openaicom-did=d98416fa-d67c-492b-83f1-50af5d21efa0&openaicom_referred=true)
- [Sora(opens in a new window)](https://sora.com/)
- [API Platform(opens in a new window)](https://platform.openai.com/)

Why language models hallucinate \| OpenAI

September 5, 2025

[Research](https://openai.com/news/research/) [Publication](https://openai.com/research/index/publication/)

# Why language models hallucinate

[Read the paper(opens in a new window)](https://arxiv.org/abs/2509.04664)

![Abstract image with sweeping gradients of teal, blue, and lavender, blending diagonally across the frame in soft, flowing streaks.](https://images.ctfassets.net/kftzwdyauwt9/5q3iK91iYCslMpYW0fmPNc/50776ce6fc897eacb94d2c05533dba96/oai_GA_Stories_16.9.png?w=3840&q=90&fm=webp)

Listen to article

Share

At OpenAI, we’re working hard to make AI systems more useful and reliable. Even as language models become more capable, one challenge remains stubbornly hard to fully solve: hallucinations. By this we mean instances where a model confidently generates an answer that isn’t true. Our [new research paper⁠(opens in a new window)](https://arxiv.org/abs/2509.04664) argues that language models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty.

ChatGPT also hallucinates. GPT‑5 has significantly fewer hallucinations [especially when reasoning⁠](https://openai.com/index/introducing-gpt-5/#:~:text=Building%20a%20more%20robust%2C%20reliable%2C%20and%20helpful%20model), but they still occur. Hallucinations remain a fundamental challenge for all large language models, but we are working hard to further reduce them.

## What are hallucinations?

Hallucinations are plausible but false statements generated by language models. They can show up in surprising ways, even for seemingly straightforward questions. For example, when we asked a widely used chatbot for the title of the PhD dissertation by Adam Tauman Kalai (an author of this paper), it confidently produced three different answers—none of them correct. When we asked for his birthday, it gave three different dates, likewise all wrong.

## Teaching to the test

Hallucinations persist partly because current evaluation methods set the wrong incentives. While evaluations themselves do not directly cause hallucinations, most evaluations measure model performance in a way that encourages guessing rather than honesty about uncertainty.

Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

As another example, suppose a language model is asked for someone’s birthday but doesn’t know. If it guesses “September 10,” it has a 1-in-365 chance of b

... (truncated, 11 KB total)
Resource ID: 1fa908f4a1b1af47 | Stable ID: MjZkMzY4MD