Reward Modeling

Alignment Trainingactive

Training learned reward functions from human preferences to guide AI optimization, including research on reward hacking and gaming.

Key Papers

Grants

Total Funding

$15K

First Proposed: 2017 (Christiano et al.)

Cluster: Alignment Training

Grants2

Name	Recipient	Amount	Funder	Date
Mitigating Reward Hacking Through RL Training Interventions	Aria Wong	$7.9K	Manifund	2026-02-18
GoalsRL — Workshop on Goal Specifications for Reinforcement Learning	GoalsRL	$7.5K	Coefficient Giving	2018-08

Funder	Grants	Total Amount
Manifund	1	$7.9K
Coefficient Giving	1	$7.5K

SEMINAL

Christiano et al.2017