Process Supervision
Alignment TrainingemergingProviding feedback on each step of reasoning rather than just final outputs, enabling more reliable chain-of-thought supervision.
Key Papers
1
First Proposed: 2023 (Lightman et al., OpenAI)
Cluster: Alignment Training
Tags
trainingreasoningsupervision
Key Papers & Resources1
SEMINAL
Let's Verify Step by Step
Lightman et al. (OpenAI)2023