Tree of Thoughts

paper

2023·arXiv·arxiv.org/abs/2305.10601

Authors

Shunyu Yao·Dian Yu·Jeffrey Zhao·Izhak Shafran·Thomas L. Griffiths·Yuan Cao·Karthik Narasimhan

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Introduces Tree of Thoughts, a framework enabling language models to perform multi-step reasoning through exploratory search rather than sequential token generation, addressing limitations in complex problem-solving that are relevant to AI capability and alignment research.

Paper Details

Citations

3,606

258 influential

Year

2023

arXiv:2305.10601 DOI:10.48550/arXiv.2305.10601 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.

Summary

Tree of Thoughts (ToT) is a novel inference framework that extends Chain of Thought prompting by enabling language models to explore multiple reasoning paths and perform deliberate decision-making with lookahead and backtracking capabilities. Rather than following a single left-to-right token generation process, ToT treats intermediate reasoning steps as coherent units of text (thoughts) that can be evaluated and explored systematically. The framework significantly improves performance on complex tasks requiring planning and search, achieving 74% success on Game of 24 compared to 4% for GPT-4 with standard chain-of-thought prompting.

Cited by 1 page

Page	Type	Quality
Long-Horizon Autonomous Tasks	Capability	65.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202658 KB

[2305.10601] Tree of Thoughts: Deliberate Problem Solving with Large Language Models 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Tree of Thoughts: Deliberate Problem Solving 
 with Large Language Models

 
 
 Shunyu Yao

 Princeton University
Dian Yu 
 Google DeepMind
Jeffrey Zhao 
 Google DeepMind
Izhak Shafran 
 Google DeepMind
Thomas L. Griffiths 
 Princeton University
Yuan Cao 
 Google DeepMind
Karthik Narasimhan 
 Princeton University
 
 

 
 Abstract

 Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role.
To surmount these challenges, we introduce a new framework for language model inference, “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”) that serve as intermediate steps toward problem solving.
ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.
Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords.
For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm .

 
 
 
 1 Introduction

 
 Originally designed to generate text, scaled-up versions of language models (LMs) such as GPT  [ 25 , 26 , 1 , 23 ] and PaLM  [ 5 ] have been shown to be increasingly capable of performing an ever wider range of tasks requiring mathematical, symbolic, commonsense, and knowledge reasoning. It is perhaps surprising that underlying all this progress is still the original autoregressive mechanism for generating text, which makes token-level decisions one by one and in a left-to-right fashion.
Is such a simple mechanism sufficient for a LM to be built toward a general problem solver?
If not, what problems would challenge the current paradigm, and what should be alternative mechanisms?

 
 
 The literature on human cognition provides some clues to answer these questions.
Research on “dual process” models suggests that people have two modes in which they engage with decisions – a fast, automatic, unconscious mode (“System 1”) and a slow, deliberate, conscious mode (“System 2”) [ 30 , 31 , 16 , 15 ] .
These two modes have previously been connected to a variety of mathematical models used in machine learning. For example, research on reinforcement learn

... (truncated, 58 KB total)

Resource ID: ba7b8013ee20dc8e | Stable ID: sid_5dqhhhuyN6