University of Maryland

paper

2023·arXiv·arxiv.org/abs/2305.15908

Authors

Seyed Mahed Mousavi·Simone Caldarella·Giuseppe Riccardi

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Research on longitudinal dialogue systems for human-machine conversation spanning multiple sessions, relevant to AI safety concerns about long-term user interaction patterns, emotional engagement, and maintaining consistent safe behavior over extended interactions.

Paper Details

Citations

0 influential

Year

2023

arXiv:2305.15908 DOI:10.18653/v1/2023.nlp4convai-1.1 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage them in personal dialogues to elaborate on their feelings, thoughts, and real-life events. In this paper, we study the task of response generation in LDs. We evaluate whether general-purpose Pre-trained Language Models (PLM) are appropriate for this purpose. We fine-tune two PLMs, GePpeTto (GPT-2) and iT5, using a dataset of LDs. We experiment with different representations of the personal knowledge extracted from LDs for grounded response generation, including the graph representation of the mentioned events and participants. We evaluate the performance of the models via automatic metrics and the contribution of the knowledge via the Integrated Gradients technique. We categorize the natural language generation errors via human evaluations of contextualization, appropriateness and engagement of the user.

Summary

This paper addresses response generation in Longitudinal Dialogues (LDs)—extended conversations spanning multiple sessions where systems must track personal events, emotions, and thoughts over time. The authors evaluate whether general-purpose pre-trained language models (GPT-2 and T5) can be effectively fine-tuned for this task using a longitudinal dialogue dataset. They experiment with different knowledge representations, including graph-based structures of events and participants, and employ both automatic metrics and human evaluation to assess model performance, contextualization, appropriateness, and user engagement.

Cited by 1 page

Page	Type	Quality
AI Safety Solution Cruxes	Crux	65.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202641 KB

[2305.15908] Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps? 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Response Generation in Longitudinal Dialogues: 
 Which Knowledge Representation Helps?

 
 
 Seyed Mahed Mousavi, Simone Caldarella, Giuseppe Riccardi
 Signals and Interactive Systems Lab, University of Trento, Italy 
 mahed.mousavi@unitn.it,giuseppe.riccardi@unitn.it 
 
 

 
 Abstract

 Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage them in personal dialogues to elaborate on their feelings, thoughts, and real-life events. In this paper, we study the task of response generation in LDs. We evaluate whether general-purpose Pre-trained Language Models (PLM) are appropriate for this purpose. We fine-tune two PLMs, GePpeTto (GPT-2) and iT5, using a dataset of LDs. We experiment with different representations of the personal knowledge extracted from LDs for grounded response generation, including the graph representation of the mentioned events and participants. We evaluate the performance of the models via automatic metrics and the contribution of the knowledge via the Integrated Gradients technique. We categorize the natural language generation errors via human evaluations of contextualization, appropriateness and engagement of the user.

 
 
 
 1 Introduction

 
 The state-of-the-art dialogue systems are designed for assisting the user to execute a task, holding limited chit-chat conversations with shallow user engagement, or information retrieval over a finite set of topics. The personalization in these systems is limited to a stereotypical user model. This user model is implicitly inferred from conversations with many users, or is limited to a superficial list of persona statements (e.g., "He likes dogs") Zhang et al. ( 2018 ) .
The dialogue sessions are disconnected and the shared information across sessions is negligible and close to none.

 
 
 Longitudinal Dialogue (LD) is one of the most challenging types of conversation for human-machine dialogue systems. LDs are multi-session interactions that encompass user-specific situations, thoughts, and emotions. Dialogue systems designed for LDs should interact uniquely with each user about personal life events and emotions over multiple sessions and long periods of time (e.g. weeks). Through each session in LDs, the dialogue system must learn about the user’s personal space of events and participants and social interactions, and engage the user in personal dialogues regarding their thoughts, feelings, and personal and world events.

 
 
 Figure 1: Examples of a task-based dialogue, a chat-chit, and a Longitudinal D

... (truncated, 41 KB total)

Resource ID: 51df12a0a334621c | Stable ID: sid_pnci9Wu6Cj