Skip to content
Longterm Wiki
Back

Capability Control Methods

paper

Authors

Ronald Cardenas·Bingsheng Yao·Dakuo Wang·Yufang Hou

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This paper addresses automated science journalism and natural language generation tasks, which relates to AI safety concerns about AI system capabilities in information processing, summarization, and potential misuse for generating misleading content at scale.

Paper Details

Citations
0
0 influential
Year
2023

Metadata

arxiv preprintprimary source

Abstract

Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet; 2) proposing a novel technical framework that integrates a paper's discourse structure with its metadata to guide generation; and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience, simplifying the information selected, and producing a coherent final report in a layman's style.

Summary

This paper addresses automatic science journalism—converting technical scientific papers into accessible news articles for general audiences. The authors introduce SciTechNews, a new dataset of scientific papers paired with corresponding news articles and expert summaries, and propose a technical framework that leverages paper discourse structure and metadata to guide generation. Their approach outperforms baselines like Alpaca and ChatGPT in creating meaningful content plans, simplifying information, and producing coherent layman-friendly reports.

Cited by 1 page

PageTypeQuality
Long-Horizon Autonomous TasksCapability65.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202684 KB
# ‘Don’t Get Too Technical with Me’:    A Discourse Structure-Based Framework for Science Journalism

Ronald CardenasUniversity of Edinburgh,
Bingsheng YaoRensselaer Polytechnic Institute

Dakuo WangNortheastern University,
Yufang HouIBM Research Europe, Ireland

###### Abstract

Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience.
We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet;
2) proposing a novel technical framework that integrates a paper’s discourse structure with its metadata to guide generation;
and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience,
simplifying the information selected, and producing a coherent final report in a layman’s style.

## 1 Introduction

_Science journalism_ refers to producing journalistic content that covers topics related to different areas of scientific research Angler ( [2017](https://ar5iv.labs.arxiv.org/html/2310.15077#bib.bib1 "")). It plays an important role in fostering public understanding of science and its impact.
However, the sheer volume of scientific literature makes it challenging for journalists to report on every significant discovery, potentially leaving many overlooked.
For instance, in the year 202220222022 alone, 185,692185692185,692 papers were submitted to the preprint repository arXiv.org spanning highly diverse scientific domains such as biomedical research, social and political sciences, engineering research and a multitude of others111 [https://info.arxiv.org/about/reports/2022\_arXiv\_annual\_report.pdf](https://info.arxiv.org/about/reports/2022_arXiv_annual_report.pdf ""). To this date, PubMed contains around 345,332345332345,332 scientific publications about the novel coronavirus Covid-19222 [https://www.ncbi.nlm.nih.gov/research/coronavirus/](https://www.ncbi.nlm.nih.gov/research/coronavirus/ ""),
nearly 1.61.61.6 times as many as those produced in 200200200 years of work on influenza.

The enormous quantity of scientific literature and the huge amount of manual effort required to produce high-quality science journalistic content inspired recent interest in tasks such as generating blog titles or slides for scientific papers Vadapalli et al. ( [2018](https://ar5iv.labs.arxiv.org/html/2310.15077#bib.bib43 "")); Sun et al. ( [2021](https://ar5iv.labs.arxiv.org/html/2310.15077#bib.bib40 "")), extracting structured knowledge from scientific literature Hou et al. ( [2019](https://ar5iv.labs.arxiv.org/html/2310.15077#bib.bib21 "")); Mondal et al. ( [2021](https://ar5i

... (truncated, 84 KB total)
Resource ID: ea759f3929d984ee | Stable ID: NGQ1YmE2NW