Skip to content
Longterm Wiki
Back

Pan et al. (2022)

paper

Authors

Pan Lu·Liang Qiu·Kai-Wei Chang·Ying Nian Wu·Song-Chun Zhu·Tanmay Rajpurohit·Peter Clark·Ashwin Kalyan

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper addresses mathematical reasoning in language models across heterogeneous data formats, contributing to understanding of LLM capabilities and limitations relevant to AI safety evaluation and alignment research.

Paper Details

Citations
0
51 influential
Year
2022

Metadata

arxiv preprintprimary source

Abstract

Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples.

Summary

This paper introduces TabMWP, a dataset of 38,431 mathematical word problems requiring reasoning over both textual and tabular data, addressing a gap in evaluating language models on heterogeneous information. The authors demonstrate that few-shot GPT-3 performs unstably on such complex problems due to sensitivity to in-context example selection. To address this, they propose PromptPG, a policy gradient-based method that learns to select optimal in-context examples from training data, achieving 5.31% improvement over baselines and significantly reducing prediction variance.

Cited by 1 page

PageTypeQuality
Goal Misgeneralization Probability ModelAnalysis61.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Pan Lu1,3, Liang Qiu1, Kai-Wei Chang1, Ying Nian Wu1, Song-Chun Zhu1,

Tanmay Rajpurohit2, Peter Clark3, Ashwin Kalyan3

1University of California, Los Angeles, 2Georgia Institute of Technology, 3Allen Institute for AI

###### Abstract

Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples.111The data and code are available at [https://promptpg.github.io](https://promptpg.github.io/ "").

Work was partially done while Pan Lu was an intern at Allen Institute for AI (AI2).

## 1 Introduction

Developing machines equipped with mathematical reasoning capabilities is one of the long-standing goals of artificial intelligence. Solving math word problems (MWPs) is a well-defined task to diagnose the ability of intelligent systems to perform numerical reasoning and problem-solving as humans. A surge of datasets has been proposed to facilitate the research in this domain (Upadhyay & Chang, [2017](https://ar5iv.labs.arxiv.org/html/2209.14610#bib.bib47 ""); Amini et al., [2019](https://ar5iv.labs.arxiv.org/html/2209.14610#bib.bib1 ""); Miao et al., [2020](https://ar5iv.labs.arxiv.org/html/2209.14610#bib.bib32 ""); Cobbe et al., [2021](https://ar5iv.labs.arxiv.org/h

... (truncated, 98 KB total)
Resource ID: 3644f42a7817a7f5 | Stable ID: NGI5NDg5NG