Scalable AI Safety via Doubly Efficient Debate (ICML 2024)

web

Semantic Scholar·semanticscholar.org/paper/Scalable-AI-Safety-via-Doubly-E...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Semantic Scholar

This ICML 2024 paper by Brown-Cohen and Irving advances the theoretical foundations of debate as a scalable oversight method, directly relevant to solving the problem of supervising superhuman AI systems.

Metadata

Importance: 72/100conference paperprimary source

Summary

This paper presents 'Doubly Efficient Debate,' a framework for scalable AI safety that extends AI Safety via Debate to settings where both the judge and debaters have bounded computational resources. It aims to make debate-based oversight practical for increasingly capable AI systems by providing theoretical guarantees on the protocol's effectiveness.

Key Points

•Extends Irving et al.'s AI Safety via Debate to handle computationally bounded judges, addressing a key practical limitation of the original framework.
•Introduces 'doubly efficient' debate protocols where both judge verification and debater argument generation are computationally tractable.
•Provides theoretical guarantees that honest debaters can win against deceptive ones, supporting debate as a viable scalable oversight mechanism.
•Addresses the core scalability challenge: how humans can evaluate AI outputs when AI capabilities far exceed human comprehension.
•Presented at ICML 2024, indicating peer-reviewed validation of the technical contributions.

Cited by 1 page

Page	Type	Quality
Why Alignment Might Be Easy	Argument	53.0

Cached Content Preview

HTTP 202Fetched Apr 7, 20260 KB

JavaScript is disabled

 In order to continue, we need to verify that you're not a robot.
 This requires JavaScript. Enable JavaScript and then reload the page.

Resource ID: 1a40fc0c4426e641 | Stable ID: sid_zLleUyn22V