[2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
ReAct is a seminal paper establishing the reasoning-plus-action paradigm central to modern LLM agent systems; relevant to AI safety discussions around agent reliability, hallucination reduction, and interpretability of autonomous AI behavior.
Paper Details
Metadata
Abstract
While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io
Summary
ReAct introduces a prompting paradigm that interleaves reasoning traces with task-specific actions in LLMs, enabling them to use external tools (e.g., Wikipedia API) while reasoning. This approach reduces hallucination and error propagation compared to chain-of-thought alone, and outperforms imitation/reinforcement learning baselines on interactive decision-making benchmarks by large margins.
Key Points
- •Interleaves 'think' steps with 'act' steps, allowing LLMs to gather external information mid-reasoning rather than relying solely on parametric knowledge.
- •On HotpotQA and FEVER, ReAct reduces hallucination by grounding reasoning in Wikipedia API lookups, improving factual accuracy.
- •On ALFWorld and WebShop, ReAct outperforms imitation and reinforcement learning methods by 34% and 10% absolute success rate with only 1-2 in-context examples.
- •Produces interpretable, human-like task-solving trajectories that improve trustworthiness and allow easier diagnosis of model errors.
- •Foundational precursor to modern LLM agent frameworks (e.g., LangChain agents, tool-use pipelines), making it highly influential in agentic AI research.
Cited by 3 pages
| Page | Type | Quality |
|---|---|---|
| Long-Horizon Autonomous Tasks | Capability | 65.0 |
| Heavy Scaffolding / Agentic Systems | Concept | 57.0 |
| Light Scaffolding | Capability | 53.0 |
Cached Content Preview
# \\model: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Work during Google internship. Projet page with code: [https://react-lm.github.io/](https://react-lm.github.io/ "").
{shunyuy,karthikn}@princeton.eduJeffrey Zhao
{jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.comDian Yu
{jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.comNan Du
{jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.comIzhak Shafran
{jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.comKarthik Narasimhan
{shunyuy,karthikn}@princeton.eduYuan Cao
{jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.com
###### Abstract
While large language models (LLMs) have demonstrated impressive performance across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics.
In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with and gather additional information from external sources such as knowledge bases or environments.
We apply our approach, named \\model, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines in addition to improved human interpretability and trustworthiness.
Concretely, on question answering (HotpotQA) and fact verification (Fever), \\model overcomes prevalent issues of hallucination and error propagation in chain-of-thought reasoning
by interacting with a simple Wikipedia API, and generating human-like task-solving trajectories that are more interpretable than baselines without reasoning traces.
Furthermore, on two interactive decision making benchmarks (ALFWorld and WebShop), \\model outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.
## 1 Introduction
A unique feature of human intelligence is the ability to seamlessly combine task-oriented actions with verbal reasoning (or inner speech, Alderson-Day & Fernyhough, [2015](https://ar5iv.labs.arxiv.org/html/2210.03629#bib.bib3 "")), which has been theorized to play an important role in human cognition for enabling self-regulation or strategization (Vygotsky, [1987](https://ar5iv.labs.arxiv.org/html/2210.03629#bib.bib31 ""); Luria, [1965](https://ar5iv.labs.arxiv.org/html/2210.03629#bib.bib20 ""); Fernyhough, [2010](https://ar5iv.labs.arxiv.org/html/2210.03629#bib.bib10 "")) and maintaining a working memory (Baddeley, [1992](https://ar5iv.labs.arxiv.org/html/2210.03629#bib.bib4 "")).
Consider the example of cooking up a dish in the kitchen. Between any two specific actions, we may reason in language in order to tra
... (truncated, 98 KB total)7647307fe49844a0 | Stable ID: Mzg5MjM0MD