controversial claims assessment

paper

2025·arXiv·arxiv.org/html/2506.02175v2

Authors

Salman Rahman·Sheriff Issaka·Ashima Suvarna·Genglin Liu·James Shiffer·Jaeyoung Lee·Md Rizwan Parvez·Hamid Palangi·Shi Feng·Nanyun Peng·Yejin Choi·Julian Michael·Liwei Jiang·Saadia Gabriel

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Research paper addressing scalable oversight challenges when evaluating AI systems on contested claims, examining how human biases affect judgment and proposing methods to ensure truthfulness despite evaluator limitations.

Paper Details

Citations

0 influential

Year

2025

arXiv:2506.02175 DOI:10.48550/arXiv.2506.02175 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.

Summary

This paper investigates AI debate as a scalable oversight mechanism for improving human judgment on controversial factual claims, particularly in domains like COVID-19 and climate change where strong prior beliefs can bias evaluation. The researchers conducted two studies: one with human judges holding mainstream or skeptical beliefs, and another with AI judges with and without human-like personas. Results show that AI debate—where two AI systems argue opposing sides of a claim—consistently improves judgment accuracy by 4-10% compared to single-advisor consultancy, with particularly strong gains for judges with mainstream beliefs (+15.2% on COVID-19 claims). AI judges with human-like personas achieved even higher accuracy (78.5%) than human judges (70.1%), suggesting potential for supervising frontier AI models.

Cited by 1 page

Page	Type	Quality
Scalable Oversight	Research Area	68.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202698 KB

AI Debate Aids Assessment of Controversial Claims 
 
 
 

 
 

 
 
 
 
 AI Debate Aids Assessment 
 of Controversial Claims

 
 
 Salman Rahman 1  Sheriff Issaka 1  Ashima Suvarna 1  Genglin Liu 1 
 James Shiffer 1   Jaeyoung Lee 2   Md Rizwan Parvez 3   Hamid Palangi 4   Shi Feng 5 
 Nanyun Peng 1   Yejin Choi 6   Julian Michael 7   Liwei Jiang 8   Saadia Gabriel 1 
 
 1 University of California, Los Angeles   2 Seoul National University 
 3 Qatar Computing Research Institute   4 Google 
 5 George Washington University   6 Stanford University   7 Scale AI   8 University of Washington 
 
 salman@cs.ucla.edu 
 Code & Data:  https://github.com/salman-lui/ai-debate 
 
 
 
 Abstract

 As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides—especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.

 
 
 Figure 1 : Human judge accuracy before and after debate versus consultancy interventions across COVID-19 and climate change domains. Each panel shows results for both domains side-by-side. Debate consistently outperforms consultancy: COVID-19 shows +10.0% overall advantage ( p < 0.01 p<0.01 ), with largest gains for mainstream believers (+15.2%, p < 0.01 p<0.01 ) versus skeptical believers (+4.7%, p ≮ 0.01 p\nless 0.01 ); Climate shows +3.8% overall advantage even when consultants argue for their prefe

... (truncated, 98 KB total)

Resource ID: 876ff73c8dabecf8 | Stable ID: sid_kiPSnDBVBJ