Scalable Oversight
Scalable OversightactiveResearch on supervising AI systems that approach or exceed human-level capabilities in specific domains.
Organizations
3
Key Papers
2
Grants
7
Total Funding
$554K
Risks Addressed
2
Cluster: Scalable Oversight
Tags
scalable-oversightsupervisionsafety-research
Organizations3
| Organization | Role |
|---|---|
| Anthropic | pioneer |
| Alignment Research Center | pioneer |
| Google DeepMind | active |
Grants7
| Name | Recipient | Amount | Funder | Date |
|---|---|---|---|---|
| Princeton University — Scalable Oversight Research | Princeton University | $100K | Coefficient Giving | 2024-04 |
| University of Michigan — Scalable Oversight Research | University of Michigan | $100K | Coefficient Giving | 2024-02 |
| University of Wisconsin–Madison — Scalable Oversight Research | University of Wisconsin–Madison | $100K | Coefficient Giving | 2024-04 |
| Compute costs for experiments to evaluate different scalable oversight protocols | Lewis Hammond | $87K | Long-Term Future Fund (LTFF) | 2024-01 |
| Berkeley Existential Risk Initiative — Scalable Oversight Dataset | Berkeley Existential Risk Initiative | $70K | Coefficient Giving | 2023-09 |
| Meta level adversarial evaluation of debate (scalable oversight technique) on simple math problems (MATS 5.0 project) | Yoav Tzfati | $62K | Long-Term Future Fund (LTFF) | 2024-01 |
| 6-month salary to verify neural network scalably for RL and produce a human to super-human scalable oversight benchmark | Roman Soletskyi | $35K | Long-Term Future Fund (LTFF) | 2024-01 |
Funding by Funder
| Funder | Grants | Total Amount |
|---|---|---|
| Coefficient Giving | 4 | $370K |
| Long-Term Future Fund (LTFF) | 3 | $184K |
Key Papers & Resources2
SEMINAL
AI Safety via Debate
Irving et al.2018
SEMINAL
Scalable Agent Alignment via Reward Modeling
Leike et al.2018
Sub-Areas5
| Name | Status | Orgs | Papers |
|---|---|---|---|
| AI Safety via DebateUsing structured debate between AI systems as a scalable mechanism for humans to judge the quality of AI reasoning. | active | 3 | 2 |
| CorrigibilityResearch on building AI systems that allow themselves to be corrected, modified, or shut down by human operators. | active | 3 | 2 |
| Eliciting Latent KnowledgeExtracting what an AI model 'actually believes' rather than what it says, addressing the distinction between model knowledge and model outputs. | active | 4 | 3 |
| Formal Verification for AIApplying mathematical proof techniques to verify safety properties of neural networks and AI systems. | emerging | 3 | 2 |
| Value LearningResearch on AI systems that learn and internalize human values through interaction, observation, or inference. | active | 3 | 2 |