controversial claims assessment
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Research paper addressing scalable oversight challenges when evaluating AI systems on contested claims, examining how human biases affect judgment and proposing methods to ensure truthfulness despite evaluator limitations.
Paper Details
Metadata
Abstract
As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
Summary
This paper investigates AI debate as a scalable oversight mechanism for improving human judgment on controversial factual claims, particularly in domains like COVID-19 and climate change where strong prior beliefs can bias evaluation. The researchers conducted two studies: one with human judges holding mainstream or skeptical beliefs, and another with AI judges with and without human-like personas. Results show that AI debate—where two AI systems argue opposing sides of a claim—consistently improves judgment accuracy by 4-10% compared to single-advisor consultancy, with particularly strong gains for judges with mainstream beliefs (+15.2% on COVID-19 claims). AI judges with human-like personas achieved even higher accuracy (78.5%) than human judges (70.1%), suggesting potential for supervising frontier AI models.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Scalable Oversight | Research Area | 68.0 |
Cached Content Preview
AI Debate Aids Assessment of Controversial Claims
AI Debate Aids Assessment
of Controversial Claims
Salman Rahman 1 Sheriff Issaka 1 Ashima Suvarna 1 Genglin Liu 1
James Shiffer 1 Jaeyoung Lee 2 Md Rizwan Parvez 3 Hamid Palangi 4 Shi Feng 5
Nanyun Peng 1 Yejin Choi 6 Julian Michael 7 Liwei Jiang 8 Saadia Gabriel 1
1 University of California, Los Angeles 2 Seoul National University
3 Qatar Computing Research Institute 4 Google
5 George Washington University 6 Stanford University 7 Scale AI 8 University of Washington
salman@cs.ucla.edu
Code & Data: https://github.com/salman-lui/ai-debate
Abstract
As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides—especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
Figure 1 : Human judge accuracy before and after debate versus consultancy interventions across COVID-19 and climate change domains. Each panel shows results for both domains side-by-side. Debate consistently outperforms consultancy: COVID-19 shows +10.0% overall advantage ( p < 0.01 p<0.01 ), with largest gains for mainstream believers (+15.2%, p < 0.01 p<0.01 ) versus skeptical believers (+4.7%, p ≮ 0.01 p\nless 0.01 ); Climate shows +3.8% overall advantage even when consultants argue for their prefe
... (truncated, 98 KB total)876ff73c8dabecf8 | Stable ID: sid_kiPSnDBVBJ