Skip to content
Longterm Wiki

Reward Hacking of Human Oversight

Evaluationemerging

Empirically investigating how AI systems deceive or manipulate human evaluators.

Organizations
4
Cluster: Evaluation
Parent Area: AI Evaluations

Tags

function:assurancescope:technique