Representation Engineering
InterpretabilityemergingControlling AI behavior by directly manipulating internal representations, including activation addition and steering vectors.
Organizations
3
Key Papers
1
Tags
interpretabilityactivation-steeringcontrol
Key Papers & Resources1
SEMINAL