Skip to content
Longterm Wiki
Navigation
Updated 2026-02-10HistoryData
Page StatusResponseTable
Edited 8 weeks ago1 backlinksUpdated monthlyOverdue by 24 days
20QualityDraft19ImportancePeripheral18ResearchMinimal
Content3/12
SummaryScheduleEntityEdit history1
Tables1/ ~2Diagrams0Int. links0/ ~3Ext. links0/ ~1Footnotes0/ ~1References0/ ~1Quotes0Accuracy0RatingsN:3 R:3 A:3 C:3Backlinks1
Change History1
Remove legacy pageTemplate frontmatter7 weeks ago

Removed the legacy `pageTemplate` frontmatter field from 15 MDX files. This field was carried over from the Astro/Starlight era and is not used by the Next.js application.

opus-4-6 · ~10min

Safety Generalizability Table

Columns:|
Expected generalization to future AI architectures
Requires (to work)Threatened by
Mechanistic Interpretability
Circuit-level understanding of model internals. High value if it works, but highly dependent on architecture stability and access.
Circuits, probing, activation patching
LOW
  • White-box access available?
  • Representations converge?
  • Architecture stable enough?
  • Heavy scaffolding?
  • Novel architecture emerges?
Training-Based Alignment
Shaping model behavior through training signals (RLHF, Constitutional AI, debate). Requires training access but somewhat architecture-agnostic.
RLHF, Constitutional AI, debate
MEDIUM
  • Training access available?
  • Gradient-based training continues?
  • Single trainable system?
  • Long distillation chains?
Black-box Evaluations
Behavioral testing, capability evals, red-teaming. Only requires query access, relatively architecture-agnostic.
Capability evals, red-teaming, benchmarks
MEDIUM-HIGH
  • Query access available?
  • Behavior predictable enough?
  • Emergent multi-agent behavior?
Control & Containment
Boxing, monitoring, tripwires, capability control. Focuses on constraining systems regardless of their internals.
Sandboxing, monitoring, kill switches
HIGH
  • Sandboxing feasible?
  • Monitoring effective?
  • Capability boundaries clear?
Few threats identified
Theoretical Alignment
Mathematical frameworks, optimization theory, agent foundations. Architecture-independent by nature.
Agent foundations, decision theory, formal frameworks
HIGHEST
  • Math applies to real systems?
Few threats identified
5 safety approaches