Scalable Oversight

Scalable Oversightactive

Research on supervising AI systems that approach or exceed human-level capabilities in specific domains.

Organizations

Key Papers

Grants

Total Funding

$554K

Risks Addressed

Cluster: Scalable Oversight

Organizations3

Organization	Role
Anthropic	pioneer
Alignment Research Center (ARC)	pioneer
Google DeepMind	active

Name	Recipient	Amount	Funder	Date
Princeton University — Scalable Oversight Research	Princeton University	$100K	Coefficient Giving	2024-04
University of Michigan — Scalable Oversight Research	University of Michigan	$100K	Coefficient Giving	2024-02
University of Wisconsin–Madison — Scalable Oversight Research	University of Wisconsin–Madison	$100K	Coefficient Giving	2024-04
Compute costs for experiments to evaluate different scalable oversight protocols	Lewis Hammond	$87K	Long-Term Future Fund (LTFF)	2024-01
Berkeley Existential Risk Initiative — Scalable Oversight Dataset	Berkeley Existential Risk Initiative	$70K	Coefficient Giving	2023-09
Meta level adversarial evaluation of debate (scalable oversight technique) on simple math problems (MATS 5.0 project)	Yoav Tzfati	$62K	Long-Term Future Fund (LTFF)	2024-01
6-month salary to verify neural network scalably for RL and produce a human to super-human scalable oversight benchmark	Roman Soletskyi	$35K	Long-Term Future Fund (LTFF)	2024-01

Funder	Grants	Total Amount
Coefficient Giving	4	$370K
Long-Term Future Fund (LTFF)	3	$184K

SEMINAL

Irving et al.2018

SEMINAL

Leike et al.2018

Name	Status	Orgs	Papers
AI Safety via DebateUsing structured debate between AI systems as a scalable mechanism for humans to judge the quality of AI reasoning.	active	3	2
CorrigibilityResearch on building AI systems that allow themselves to be corrected, modified, or shut down by human operators.	active	3	2
Eliciting Latent KnowledgeExtracting what an AI model 'actually believes' rather than what it says, addressing the distinction between model knowledge and model outputs.	active	4	3
Formal Verification for AIApplying mathematical proof techniques to verify safety properties of neural networks and AI systems.	emerging	3	2
Value LearningResearch on AI systems that learn and internalize human values through interaction, observation, or inference.	active	3	2