Skip to content
Longterm Wiki
Navigation
Updated 2026-02-20HistoryData
Page StatusResponse
Edited 6 weeks ago88 words
27QualityDraft62ImportanceUseful51.5ResearchModerate
Content3/13
SummaryScheduleEntityEdit history1Overview
Tables0/ ~1Diagrams0Int. links9/ ~3Ext. links0/ ~1Footnotes0/ ~2References0/ ~1Quotes0Accuracy0RatingsN:1.5 R:1 A:2 C:3.5
Change History1
Clarify overview pages with new entity type7 weeks ago

Added `overview` as a proper entity type throughout the system, migrated all 36 overview pages to `entityType: overview`, built overview-specific InfoBox rendering with child page links, created an OverviewBanner component, and added a knowledge-base-overview page template to Crux.

Issues1
StructureNo tables or diagrams - consider adding visual content

Training Methods (Overview)

Training methods for alignment focus on shaping model behavior during the learning process.

Core Approaches:

  • RLHF: Reinforcement Learning from Human Feedback - the foundation of modern alignment training
  • Constitutional AI: Self-critique based on principles
  • Preference Optimization: Direct preference learning (DPO, IPO)

Specialized Techniques:

  • Process Supervision: Rewarding reasoning steps, not just outcomes
  • Reward Modeling: Learning human preferences from comparisons
  • Refusal Training: Teaching models to decline harmful requests
  • Adversarial Training: Robustness through adversarial examples

Advanced Methods:

  • Weak-to-Strong Generalization: Can weak supervisors train strong models?
  • Capability Unlearning: Removing dangerous knowledge

Related Wiki Pages

Top Related Pages

Other

RLHF

Approaches

Refusal TrainingAdversarial TrainingReward Modeling