Skip to content
Longterm Wiki
Back

online iterative RLHF

web
rlhfbook.com·rlhfbook.com/

An open online textbook on RLHF, useful for researchers and practitioners seeking a structured introduction to human feedback-based alignment techniques, including the iterative online variants used in modern LLM training pipelines.

Metadata

Importance: 62/100bookeducational

Summary

An online textbook dedicated to Reinforcement Learning from Human Feedback (RLHF), covering the theory, methods, and practical implementation of training AI systems using human preference feedback. It focuses particularly on online and iterative RLHF approaches used to align large language models with human values and intentions.

Key Points

  • Comprehensive coverage of RLHF methodology including reward modeling, preference learning, and policy optimization
  • Emphasizes online iterative RLHF, where the model is updated continuously as new human feedback is collected
  • Bridges theoretical foundations and practical implementation for aligning LLMs using human feedback
  • Relevant to understanding how models like ChatGPT and Claude are fine-tuned for safety and helpfulness
  • Serves as a reference for researchers and practitioners working on alignment via human supervision

Cited by 2 pages

PageTypeQuality
Reward ModelingApproach55.0
RLHFResearch Area63.0

Cached Content Preview

HTTP 200Fetched Feb 26, 20262 KB
## Changelog

_Last built: 25 February 2026_

**January 2026**: Major chapter reorganization to match Manning book structure. Old URLs redirect to new locations.

**December 2025**: Working on v2 of the book based on editors feedback! Do check back for updates!

**2 July 2025**: Add tool use chapter (see [PR](https://github.com/natolambert/rlhf-book/pull/122))

**6 June 2025**: v1.1. Lots of RLVR/reasoning improvements (see [PR](https://github.com/natolambert/rlhf-book/pull/120))

**14 Apr. - 16 Apr. 2025**: Finish v0. Overoptimization, open questions, etc.

**6 Apr. - 12 Apr. 2025.**: Evaluation section

**28 Mar. - 5 Apr. 2025.**: Research on RLHF x Product, cleaning, improving website, reasoning section

**17 Mar. - 27 Mar 2025.**: Improving policy gradient section, minor changes

**6 Mar. - 16 Mar 2025.**: Finish DPO, major cleaning

**26 Feb. - 5 Mar 2025.**: Start DPO chapter, improve intro

**20-25 Feb. 2025**: Improve SEO, add IFT chapter, minor edits

**10-15 Feb. 2025**: RM additions, preference data, cleaning, policy gradient finalization

**8 Feb. 2025**: RM additions, editing, cleaning

**4 Feb. 2025**: PPO and GAE

**2 Feb. 2025**: Added changelog, revamped introduction,

## Acknowledgements

I would like to thank the following people who helped me directly with this project: Costa Huang, (and of course Claude). Indirect shout-outs go to Ross Taylor, Hamish Ivison, John Schulman, Valentina Pyatkin, Daniel Han, Shane Gu, Joanne Jang, LJ Miranda, and others in my RL sphere.

Additionally, thank you to the [contributors on GitHub](https://github.com/natolambert/rlhf-book/graphs/contributors) who helped improve this project.
Resource ID: ebcbaba2d260e656 | Stable ID: MmI0NDE1OW