Skip to content
Longterm Wiki

Direct Preference Optimization

Alignment Trainingactive

Family of training methods (DPO, KTO, GRPO) that optimize language models directly on preference data without a separate reward model.

Organizations
3
Key Papers
1
First Proposed: 2023 (Rafailov et al.)
Cluster: Alignment Training

Tags

trainingpreferencesalignment

Organizations3

OrganizationRole
Anthropicactive
Google DeepMindactive
OpenAIactive

Key Papers & Resources1