Skip to content
Longterm Wiki
Navigation
Updated 2026-02-20HistoryData
Page StatusResponse
Edited 6 weeks ago99 words
32QualityDraft62.5ImportanceUseful52.5ResearchModerate
Content3/13
SummaryScheduleEntityEdit history1Overview
Tables0/ ~1Diagrams0Int. links12/ ~3Ext. links0/ ~1Footnotes0/ ~2References0/ ~1Quotes0Accuracy0RatingsN:1.5 R:1 A:3.5 C:5.5
Change History1
Clarify overview pages with new entity type7 weeks ago

Added `overview` as a proper entity type throughout the system, migrated all 36 overview pages to `entityType: overview`, built overview-specific InfoBox rendering with child page links, created an OverviewBanner component, and added a knowledge-base-overview page template to Crux.

Issues1
StructureNo tables or diagrams - consider adding visual content

Evaluation & Detection (Overview)

Evaluation methods assess whether AI systems are aligned and safe to deploy.

General Evaluation:

  • Evaluations (Evals): Overview of AI evaluation approaches
  • Alignment Evaluations: Testing for aligned behavior
  • Dangerous Capability Evaluations: Assessing harmful potential

Capability Assessment:

  • Capability Elicitation: Uncovering hidden capabilities
  • Red Teaming: Adversarial testing for vulnerabilities
  • Model Auditing: Systematic capability review

Deception Detection:

  • Scheming Detection: Identifying strategic deception
  • Sleeper Agent Detection: Finding hidden malicious behaviors

Evaluation Scaling:

  • Eval Saturation & The Evals Gap: Accelerating benchmark saturation and its implications
  • Scalable Eval Approaches: Practical tools for scaling evaluation capacity
  • Evaluation Awareness: Models detecting and adapting to evaluation contexts

Deployment Decisions:

  • Safety Cases: Structured arguments for deployment safety

Related Wiki Pages

Top Related Pages

Approaches

Third-Party Model AuditingAlignment EvaluationsSleeper Agent DetectionDangerous Capability EvaluationsScalable Eval ApproachesEval Saturation & The Evals Gap

Other

Red Teaming