Skip to content
Longterm Wiki

Sleeper Agent Detection

Evaluationemerging

Research on detecting and mitigating backdoors, trojans, and time-delayed deceptive behavior in AI systems.

Organizations
4
Key Papers
1
Grants
3
Total Funding
$125K
First Proposed: 2024 (Hubinger et al., Anthropic)
Cluster: Evaluation
Parent Area: AI Evaluations

Tags

evaluationsbackdoorsdeception

Grants3

NameRecipientAmountFunderDate
4-month stipend for 3 people to create demonstrations of provably undetectable backdoorsAndrew Gritsevskiy$50KLong-Term Future Fund (LTFF)2024-01
Grant to "support prizes for a trojan detection competition at NeurIPS, which involves identifying whether a deep neural network will suddenly change behavior if certain unknown conditions are met."Trojan Detection Challenge at NeurIPS 2022$50KFTX Future Fund2022-05
ETH Zurich Foundation (USA) — Machine Learning Research SupportETH Zurich Fondation (USA)$25KCoefficient Giving2023-11

Funding by Funder

Key Papers & Resources1