controversial claims assessment
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Research paper addressing scalable oversight challenges when evaluating AI systems on contested claims, examining how human biases affect judgment and proposing methods to ensure truthfulness despite evaluator limitations.
Paper Details
Metadata
Abstract
As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
Summary
This paper investigates AI debate as a scalable oversight mechanism for improving human judgment on controversial factual claims, particularly in domains like COVID-19 and climate change where strong prior beliefs can bias evaluation. The researchers conducted two studies: one with human judges holding mainstream or skeptical beliefs, and another with AI judges with and without human-like personas. Results show that AI debate—where two AI systems argue opposing sides of a claim—consistently improves judgment accuracy by 4-10% compared to single-advisor consultancy, with particularly strong gains for judges with mainstream beliefs (+15.2% on COVID-19 claims). AI judges with human-like personas achieved even higher accuracy (78.5%) than human judges (70.1%), suggesting potential for supervising frontier AI models.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Scalable Oversight | Research Area | 68.0 |
Cached Content Preview
[License: CC BY 4.0](https://info.arxiv.org/help/license/index.html#licenses-available)
arXiv:2506.02175v2 \[cs.CL\] 29 Oct 2025
# AI Debate Aids Assessment of Controversial Claims
Report issue for preceding element
Salman Rahman1 Sheriff Issaka1 Ashima Suvarna1 Genglin Liu1
James Shiffer1 Jaeyoung Lee2 Md Rizwan Parvez3 Hamid Palangi4 Shi Feng5
Nanyun Peng1 Yejin Choi6 Julian Michael7 Liwei Jiang8 Saadia Gabriel1
1University of California, Los Angeles 2Seoul National University
3Qatar Computing Research Institute 4Google
5George Washington University 6Stanford University 7Scale AI 8University of Washington
[salman@cs.ucla.edu](mailto:salman@cs.ucla.edu "")
Code & Data: [https://github.com/salman-lui/ai-debate](https://github.com/salman-lui/ai-debate "")
Report issue for preceding element
###### Abstract
Report issue for preceding element
As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides—especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
Report issue for preceding element
Figure 1: Human judge accuracy before and after debate versus consultancy interventions across COVID-19 and climate change domains. Each panel shows results for both domains side-by-side. Debate consistently ou
... (truncated, 98 KB total)876ff73c8dabecf8 | Stable ID: MmY5ZjczMW