Skip to content
Longterm Wiki

AI safety research organization focused on mechanistic interpretability and developmental interpretability. Studies how neural networks form internal representations during training. Funded by SFF.