Direct Preference Optimization
Alignment TrainingactiveFamily of training methods (DPO, KTO, GRPO) that optimize language models directly on preference data without a separate reward model.
Organizations
3
Key Papers
1
First Proposed: 2023 (Rafailov et al.)
Cluster: Alignment Training
Tags
trainingpreferencesalignment
Organizations3
| Organization | Role |
|---|---|
| Anthropic | active |
| Google DeepMind | active |
| OpenAI | active |
Key Papers & Resources1
SEMINAL