Capability Control Methods

paper

2023·arXiv·arxiv.org/abs/2310.15077

Authors

Ronald Cardenas·Bingsheng Yao·Dakuo Wang·Yufang Hou

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper addresses automated science journalism and natural language generation tasks, which relates to AI safety concerns about AI system capabilities in information processing, summarization, and potential misuse for generating misleading content at scale.

Paper Details

Citations

0 influential

Year

2023

arXiv:2310.15077 DOI:10.3403/30286922 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet; 2) proposing a novel technical framework that integrates a paper's discourse structure with its metadata to guide generation; and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience, simplifying the information selected, and producing a coherent final report in a layman's style.

Summary

This paper addresses automatic science journalism—converting technical scientific papers into accessible news articles for general audiences. The authors introduce SciTechNews, a new dataset of scientific papers paired with corresponding news articles and expert summaries, and propose a technical framework that leverages paper discourse structure and metadata to guide generation. Their approach outperforms baselines like Alpaca and ChatGPT in creating meaningful content plans, simplifying information, and producing coherent layman-friendly reports.

Cited by 1 page

Page	Type	Quality
Long-Horizon Autonomous Tasks	Capability	65.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202674 KB

[2310.15077] ‘Don’t Get Too Technical with Me’: A Discourse Structure-Based Framework for Science Journalism 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 ‘Don’t Get Too Technical with Me’: 
 A Discourse Structure-Based Framework for Science Journalism

 
 
 Ronald Cardenas 
 
 University of Edinburgh,
 
 
 Bingsheng Yao 
 
 Rensselaer Polytechnic Institute 
 
 
 
 Dakuo Wang 
 
 Northeastern University,
 
 
 Yufang Hou 
 
 IBM Research Europe, Ireland 
 
 
 

 
 Abstract

 Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience.
We aim to design an automated system to support this real-world task (i.e., automatic science journalism ) by 1) introducing a newly-constructed and real-world dataset ( SciTechNews ), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet;
2) proposing a novel technical framework that integrates a paper’s discourse structure with its metadata to guide generation;
and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience,
simplifying the information selected, and producing a coherent final report in a layman’s style.

 
 
 
 1 Introduction

 
 Science journalism refers to producing journalistic content that covers topics related to different areas of scientific research Angler ( 2017 ) . It plays an important role in fostering public understanding of science and its impact.
However, the sheer volume of scientific literature makes it challenging for journalists to report on every significant discovery, potentially leaving many overlooked.
For instance, in the year 2022 2022 2022 alone, 185 , 692 185 692 185,692 papers were submitted to the preprint repository arXiv.org spanning highly diverse scientific domains such as biomedical research, social and political sciences, engineering research and a multitude of others 1 1 1 https://info.arxiv.org/about/reports/2022_arXiv_annual_report.pdf . To this date, PubMed contains around 345 , 332 345 332 345,332 scientific publications about the novel coronavirus Covid-19 2 2 2 https://www.ncbi.nlm.nih.gov/research/coronavirus/ ,
nearly 1.6 1.6 1.6 times as many as those produced in 200 200 200 years of work on influenza.

 
 
 The enormous quantity of scientific literature and the huge amount of manual effort required to produce high-quality science journalistic content inspired recent interest in tasks such as generating blog titles or slides for scientific papers Vadapalli et al. ( 2018 ); Sun et al. ( 2021 ) , extracting structured knowledge from scientific literature Hou et al. ( 2019 ); Mondal et al. ( 2021 ); Zhang et al. ( 2022 ) , simplifying technical health manuals for the general public Cao et al. ( 2020 ) , and creating plain language summaries for s

... (truncated, 74 KB total)

Resource ID: ea759f3929d984ee | Stable ID: sid_mKuf9Tz3oU