Skip to content
Longterm Wiki
Back

OpenAI: Why Language Models Hallucinate PDF

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

A September 2025 technical paper from OpenAI/Georgia Tech offering a theoretical framework for understanding hallucinations; directly relevant to AI reliability, benchmark design, and the challenge of building trustworthy AI systems.

Metadata

Importance: 72/100organizational reportprimary source

Summary

This paper by OpenAI and Georgia Tech researchers provides a formal computational learning theory analysis of why language models hallucinate, arguing hallucinations arise from statistical pressures in training (even with error-free data) and persist because evaluation benchmarks reward guessing over expressing uncertainty. The authors propose that fixing benchmark scoring—rather than adding more hallucination evaluations—is the key socio-technical intervention to steer toward more trustworthy AI.

Key Points

  • Hallucinations originate as binary classification errors: when incorrect statements cannot be distinguished from facts, statistical training pressures produce plausible falsehoods.
  • Even with perfectly clean training data, the objectives optimized during LLM training would still lead to hallucinations; realistic noisy data worsens this.
  • Hallucinations persist because benchmarks reward guessing—models are optimized to be good 'test-takers,' where guessing improves scores over admitting uncertainty.
  • The proposed fix is modifying scoring of existing misaligned leaderboard benchmarks to penalize overconfident wrong answers, rather than adding new hallucination-specific evaluations.
  • The analysis is grounded in computational learning theory (PAC learning framework), providing formal lower bounds on error rates under standard training objectives.

Cited by 1 page

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Why Language Models Hallucinate

Adam Tauman Kalai∗ Ofir Nachum Santosh S. Vempala† OpenAI OpenAI Georgia Tech

Edwin Zhang OpenAI

September 4, 2025

# Abstract

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such “hallucinations” persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline. Hallucinations need not be mysterious—they originate simply as errors in binary classification. If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures. We then argue that hallucinations persist due to the way most evaluations are graded—language models are optimized to be good test-takers, and guessing when uncertain improves test performance. This “epidemic” of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.

# 1 Introduction

Language models are known to produce overconfident, plausible falsehoods, which diminish their utility. This error mode is known as “hallucination,” though it differs fundamentally from the human perceptual experience. Despite significant progress, hallucinations continue to plague the field, and are still present in the latest models (OpenAI, 2025a). Consider the prompt:

What is Adam Tauman Kalai’s birthday? If you know, just respond with DD-MM.

On three separate attempts, a state-of-the-art open-source language model $^ 1$ output three incorrect dates: “03-07”, “15-06”, and “01-01”, even though a response was requested only if known. The correct date is in Autumn. Table 1 provides an example of more elaborate hallucinations.

Hallucinations are an important special case of errors produced by language models, which we analyze more generally using computational learning theory (e.g., Kearns and Vazirani, 1994). We consider general sets of errors $\\varepsilon$ , an arbitrary subset of plausible strings $\\mathcal { X } = \\mathcal { E } \\cup \\mathcal { V }$ , with the other plausible strings $\\nu$ being called valid. We then analyze the statistical nature of these errors, and

ChatGPT: Adam Tauman Kalai’s Ph.D. dissertation (completed in 2002 at CMU) is entitled: (GPT-4o) “Boosting, Online Algorithms, and Other Topics in Machine Learning.”

DeepSeek: “Algebraic Methods in Interactive Machine Learning”. . . at Harvard University in 2005.

Llama: “Efficient Algorithms for Learning and Playing Games”. . . in 2007 at MIT.


... (truncated, 98 KB total)
Resource ID: 35a1956016db2d64 | Stable ID: NDk3YTBkNz