Skip to content
Longterm Wiki

Scalable Oversight

Scalable Oversightactive

Research on supervising AI systems that approach or exceed human-level capabilities in specific domains.

Organizations
3
Key Papers
2
Grants
7
Total Funding
$554K
Risks Addressed
2
Cluster: Scalable Oversight

Tags

scalable-oversightsupervisionsafety-research

Organizations3

OrganizationRole
Anthropicpioneer
Alignment Research Centerpioneer
Google DeepMindactive

Grants7

NameRecipientAmountFunderDate
Princeton University — Scalable Oversight ResearchPrinceton University$100KCoefficient Giving2024-04
University of Michigan — Scalable Oversight ResearchUniversity of Michigan$100KCoefficient Giving2024-02
University of Wisconsin–Madison — Scalable Oversight ResearchUniversity of Wisconsin–Madison$100KCoefficient Giving2024-04
Compute costs for experiments to evaluate different scalable oversight protocolsLewis Hammond$87KLong-Term Future Fund (LTFF)2024-01
Berkeley Existential Risk Initiative — Scalable Oversight DatasetBerkeley Existential Risk Initiative$70KCoefficient Giving2023-09
Meta level adversarial evaluation of debate (scalable oversight technique) on simple math problems (MATS 5.0 project)Yoav Tzfati$62KLong-Term Future Fund (LTFF)2024-01
6-month salary to verify neural network scalably for RL and produce a human to super-human scalable oversight benchmarkRoman Soletskyi$35KLong-Term Future Fund (LTFF)2024-01

Funding by Funder

FunderGrantsTotal Amount
Coefficient Giving4$370K
Long-Term Future Fund (LTFF)3$184K

Key Papers & Resources2

SEMINAL
AI Safety via Debate
Irving et al.2018

Sub-Areas5

NameStatusOrgsPapers
AI Safety via DebateUsing structured debate between AI systems as a scalable mechanism for humans to judge the quality of AI reasoning.active32
CorrigibilityResearch on building AI systems that allow themselves to be corrected, modified, or shut down by human operators.active32
Eliciting Latent KnowledgeExtracting what an AI model 'actually believes' rather than what it says, addressing the distinction between model knowledge and model outputs.active43
Formal Verification for AIApplying mathematical proof techniques to verify safety properties of neural networks and AI systems.emerging32
Value LearningResearch on AI systems that learn and internalize human values through interaction, observation, or inference.active32