Skip to content
Longterm Wiki

Representation Engineering

Interpretabilityemerging

Controlling AI behavior by directly manipulating internal representations, including activation addition and steering vectors.

Organizations
3
Key Papers
1
First Proposed: 2023 (Zou et al.)
Cluster: Interpretability
Parent Area: Interpretability

Tags

interpretabilityactivation-steeringcontrol

Key Papers & Resources1