Skip to content
Longterm Wiki
Back

controversial claims assessment

paper

Authors

Salman Rahman·Sheriff Issaka·Ashima Suvarna·Genglin Liu·James Shiffer·Jaeyoung Lee·Md Rizwan Parvez·Hamid Palangi·Shi Feng·Nanyun Peng·Yejin Choi·Julian Michael·Liwei Jiang·Saadia Gabriel

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Research paper addressing scalable oversight challenges when evaluating AI systems on contested claims, examining how human biases affect judgment and proposing methods to ensure truthfulness despite evaluator limitations.

Paper Details

Citations
2
0 influential
Year
2025

Metadata

arxiv preprintprimary source

Abstract

As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.

Summary

This paper investigates AI debate as a scalable oversight mechanism for improving human judgment on controversial factual claims, particularly in domains like COVID-19 and climate change where strong prior beliefs can bias evaluation. The researchers conducted two studies: one with human judges holding mainstream or skeptical beliefs, and another with AI judges with and without human-like personas. Results show that AI debate—where two AI systems argue opposing sides of a claim—consistently improves judgment accuracy by 4-10% compared to single-advisor consultancy, with particularly strong gains for judges with mainstream beliefs (+15.2% on COVID-19 claims). AI judges with human-like personas achieved even higher accuracy (78.5%) than human judges (70.1%), suggesting potential for supervising frontier AI models.

Cited by 1 page

PageTypeQuality
Scalable OversightResearch Area68.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
[License: CC BY 4.0](https://info.arxiv.org/help/license/index.html#licenses-available)

arXiv:2506.02175v2 \[cs.CL\] 29 Oct 2025

# AI Debate Aids Assessment    of Controversial Claims

Report issue for preceding element

Salman Rahman1  Sheriff Issaka1  Ashima Suvarna1  Genglin Liu1

James Shiffer1   Jaeyoung Lee2   Md Rizwan Parvez3   Hamid Palangi4   Shi Feng5

Nanyun Peng1   Yejin Choi6   Julian Michael7   Liwei Jiang8   Saadia Gabriel1

1University of California, Los Angeles  2Seoul National University

3Qatar Computing Research Institute  4Google

5George Washington University  6Stanford University  7Scale AI  8University of Washington

[salman@cs.ucla.edu](mailto:salman@cs.ucla.edu "")

Code & Data: [https://github.com/salman-lui/ai-debate](https://github.com/salman-lui/ai-debate "")

Report issue for preceding element

###### Abstract

Report issue for preceding element

As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides—especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.

Report issue for preceding element

![Refer to caption](https://arxiv.org/html/2506.02175v2/x1.png)Figure 1: Human judge accuracy before and after debate versus consultancy interventions across COVID-19 and climate change domains. Each panel shows results for both domains side-by-side. Debate consistently ou

... (truncated, 98 KB total)
Resource ID: 876ff73c8dabecf8 | Stable ID: MmY5ZjczMW