Skip to content
Longterm Wiki
All Source Checks
Citation

Reducing Hallucinations in AI-Generated Wiki Content - Footnote 26

partial85% confidence

1 evidence check

Last checked: 4/3/2026

The AI Index Report is mentioned, but the year is not specified in the source. The source does not mention RLAIF (Reinforcement Learning from AI Feedback) or DPO (Direct Preference Optimization).

Evidence — 1 source, 1 check

partial85%Haiku 4.5 · 4/3/2026
Found: **Reinforcement Learning from Human Feedback (RLHF)** trains models to prefer outputs that human reviewers label as correct. Research shows RLHF can reduce factual errors by 40% (GPT-4) and harmful ha

Note: The AI Index Report is mentioned, but the year is not specified in the source. The source does not mention RLAIF (Reinforcement Learning from AI Feedback) or DPO (Direct Preference Optimization).

Debug info

Record type: citation

Record ID: page:reducing-hallucinations:fn26