[2109.10862] Recursively Summarizing Books with Human Feedback

paper

2021·arXiv·arxiv.org/abs/2109.10862

Authors

Jeff Wu·Long Ouyang·Daniel M. Ziegler·Nisan Stiennon·Ryan Lowe·Jan Leike·Paul Christiano

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A foundational OpenAI paper on scalable oversight and RLHF, directly relevant to the problem of supervising AI systems on tasks humans cannot easily evaluate—a core challenge in AI safety research.

Paper Details

Citations

356

30 influential

Year

2021

arXiv:2109.10862 Semantic Scholar

Metadata

Importance: 72/100arxiv preprintprimary source

Abstract

A major challenge for scaling machine learning is training models to perform tasks that are very difficult or time-consuming for humans to evaluate. We present progress on this problem on the task of abstractive summarization of entire fiction novels. Our method combines learning from human feedback with recursive task decomposition: we use models trained on smaller parts of the task to assist humans in giving feedback on the broader task. We collect a large volume of demonstrations and comparisons from human labelers, and fine-tune GPT-3 using behavioral cloning and reward modeling to do summarization recursively. At inference time, the model first summarizes small sections of the book and then recursively summarizes these summaries to produce a summary of the entire book. Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves. Our resulting model generates sensible summaries of entire books, even matching the quality of human-written summaries in a few cases ($\sim5\%$ of books). We achieve state-of-the-art results on the recent BookSum dataset for book-length summarization. A zero-shot question-answering model using these summaries achieves state-of-the-art results on the challenging NarrativeQA benchmark for answering questions about books and movie scripts. We release datasets of samples from our model.

Summary

This paper presents a method for summarizing entire fiction novels by combining reinforcement learning from human feedback (RLHF) with recursive task decomposition, enabling human supervisors to provide feedback on complex tasks without needing to evaluate the full output themselves. The approach fine-tunes GPT-3 via behavioral cloning and reward modeling, achieving state-of-the-art results on BookSum and competitive results on NarrativeQA.

Key Points

•Introduces recursive task decomposition as a scalability technique: models summarize small sections first, then recursively summarize those summaries into full-book summaries.
•Demonstrates that human labelers can effectively supervise AI on tasks they cannot fully evaluate themselves, a key insight for scalable oversight.
•Achieves state-of-the-art on BookSum dataset; ~5% of generated summaries match human-written quality.
•Combines behavioral cloning and reward modeling (RLHF) with GPT-3, representing an early application of human feedback to long-horizon complex tasks.
•Directly addresses the scalable oversight problem: how to maintain human control as AI tackles tasks too complex for direct human evaluation.

Cited by 2 pages

Page	Type	Quality
Large Language Models	Capability	60.0
Jan Leike	Person	27.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202698 KB

[2109.10862] Recursively Summarizing Books with Human Feedback 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
 \xpatchcmd \sv@part 
 Part 

 
 
 #2. #2 

 
 Recursively Summarizing Books with Human Feedback

 
 
 Jeff Wu
Long Ouyang ∗ 
Daniel M. Ziegler ∗ 
Nisan Stiennon ∗ 
Ryan Lowe ∗ 
Jan Leike ∗ 
Paul Christiano ∗ 
OpenAI
 This was a joint project of the OpenAI Alignment team. JW and LO contributed equally. DMZ, NS, and RL were full-time contributors for most of the duration. JL and PC managed the team. Corresponding author jeffwu@openai.com.
 
 

 
 Abstract

 A major challenge for scaling machine learning is training models to perform tasks that are very difficult or time-consuming for humans to evaluate. We present progress on this problem on the task of abstractive summarization of entire fiction novels. Our method combines learning from human feedback with recursive task decomposition:
we use models trained on smaller parts of the task to assist humans in giving feedback on the broader task.
We collect a large volume of demonstrations and comparisons from human labelers, and fine-tune GPT-3 using behavioral cloning and reward modeling to do summarization recursively.
At inference time, the model first summarizes small sections of the book and then recursively summarizes these summaries to produce a summary of the entire book. Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves.
Our resulting model generates sensible summaries of entire books, even matching the quality of human-written summaries in a few cases ( ∼ 5 % similar-to absent percent 5 \sim 5\% of books).
We achieve state-of-the-art results on the recent BookSum dataset for book-length summarization.
A zero-shot question-answering model using these summaries achieves competitive results on the challenging NarrativeQA benchmark for answering questions about books and movie scripts. We release datasets of samples from our model. 1 1 1 See https://openaipublic.blob.core.windows.net/recursive-book-summ/website/index.html 

 
 
 \doparttoc \faketableofcontents 
 
 
 
 1 Introduction

 
 To train an ML model on a new task, we need a training signal that tells the model which behaviors are better and which are worse. For some tasks, like playing a video game, this training signal can be calculated automatically.
However, for many useful tasks an accurate training signal can only be provided via a human in the loop. For example, humans can provide demonstrations of the correct behavior (Bain and Sammut,, 1995 ) or compare two outputs from the model being trained (Christiano et al.,, 2017 ) , and this data is used to train the model.

 
 
 In this paper we focus on tasks that are difficult for humans to supervise or evaluate, either because the tasks take a lot of time or because they require specialized knowledge and expertise to evaluate. For example, imagine training a model to summarize an entire sub-field of scient

... (truncated, 98 KB total)

Resource ID: 54eec19853953598 | Stable ID: sid_oeQPH5pIMU