Longterm Wiki
Updated 2026-02-09HistoryData
Page StatusResponse
Edited 4 days ago7 words1 backlinks
2
Structure2/15
00000%0%
Issues1
StructureNo tables or diagrams - consider adding visual content

Natural Abstractions

Concept

Natural Abstractions

The hypothesis that natural abstractions converge across learning processes, aiding alignment

Related
Safety Agendas
Interpretability
7 words · 1 backlinks

This page is a stub. Content needed.

Related Pages

Top Related Pages

Approaches

Representation EngineeringSleeper Agent DetectionAI-Assisted AlignmentMechanistic Interpretability

People

Dario AmodeiYoshua BengioChris Olah

Labs

ConjectureAnthropic

Analysis

Model Organisms of MisalignmentCapability-Alignment Race Model

Safety Research

Anthropic Core Views

Key Debates

AI Alignment Research AgendasTechnical AI Safety ResearchIs Interpretability Sufficient for Safety?

Concepts

Dense Transformers

Historical

Deep Learning Revolution EraMainstream Era

Transition Model

Interpretability Coverage

Organizations

Redwood Research