Skip to content
Longterm Wiki
Back

Research by Valmeekam et al. (2023)

paper

Authors

Karthik Valmeekam·Matthew Marquez·Sarath Sreedharan·Subbarao Kambhampati

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Empirical study evaluating large language models' planning and reasoning capabilities, examining their effectiveness in autonomous planning tasks and potential as heuristic sources for external planners—relevant to understanding LLM reliability and safety constraints.

Paper Details

Citations
390
23 influential
Year
2023

Metadata

arxiv preprintprimary source

Abstract

Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external planners and verifiers. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the LLM-Modulo setting show more promise. In the LLM-Modulo setting, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners and additionally show that external verifiers can help provide feedback on the generated plans and back-prompt the LLM for better plan generation.

Summary

Valmeekam et al. (2023) investigates the planning capabilities of large language models (LLMs) by evaluating their performance on commonsense planning tasks in two settings: autonomous plan generation and LLM-Modulo (where LLMs provide heuristic guidance to external planners). The study finds that LLMs have severely limited autonomous planning abilities, with GPT-4 achieving only ~12% success rate across domains. However, the LLM-Modulo approach shows promise, demonstrating that LLM-generated guidance can enhance external planners' search processes and that external verifiers can provide feedback to iteratively improve LLM plan generation.

Cited by 1 page

PageTypeQuality
Reasoning and PlanningCapability65.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20261 KB
Conversion to HTML had a Fatal error and exited abruptly. This document may be truncated or damaged.

[◄](https://ar5iv.labs.arxiv.org/html/2305.15770) [![ar5iv homepage](https://ar5iv.labs.arxiv.org/assets/ar5iv.png)](https://ar5iv.labs.arxiv.org/) [Feeling\\
\\
lucky?](https://ar5iv.labs.arxiv.org/feeling_lucky) [Conversion\\
\\
report](https://ar5iv.labs.arxiv.org/log/2305.15771) [Report\\
\\
an issue](https://github.com/dginev/ar5iv/issues/new?template=improve-article--arxiv-id-.md&title=Improve+article+2305.15771) [View original\\
\\
on arXiv](https://arxiv.org/abs/2305.15771) [►](https://ar5iv.labs.arxiv.org/html/2305.15772)
Resource ID: 984d52715ad3ac6c | Stable ID: NzNkZDE3Zm