Skip to content
Longterm Wiki

Reward Modeling

Alignment Trainingactive

Training learned reward functions from human preferences to guide AI optimization, including research on reward hacking and gaming.

Key Papers
1
Grants
2
Total Funding
$15K
First Proposed: 2017 (Christiano et al.)
Cluster: Alignment Training

Tags

trainingrewardalignment

Grants2

NameRecipientAmountFunderDate
Mitigating Reward Hacking Through RL Training InterventionsAria Wong$7.9KManifund2026-02-18
GoalsRL — Workshop on Goal Specifications for Reinforcement LearningGoalsRL$7.5KCoefficient Giving2018-08

Funding by Funder

FunderGrantsTotal Amount
Manifund1$7.9K
Coefficient Giving1$7.5K

Key Papers & Resources1