Irvin et al. (2019)

paper

2019·arXiv·arxiv.org/abs/1901.07031

Authors

Jeremy Irvin·Pranav Rajpurkar·Michael Ko·Yifan Yu·Silviana Ciurea-Ilcus·Chris Chute·Henrik Marklund·Behzad Haghgoo·Robyn Ball·Katie Shpanskaya·Jayne Seekins·David A. Mong·Safwan S. Halabi·Jesse K. Sandberg·Ricky Jones·David B. Larson·Curtis P. Langlotz·Bhavik N. Patel·Matthew P. Lungren·Andrew Y. Ng

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

CheXpert is a large-scale medical imaging dataset with 224,316 chest radiographs used for training deep learning models; relevant to AI safety for studying dataset quality, uncertainty handling, and safe deployment of medical AI systems.

Paper Details

Citations

3,268

452 influential

Year

2019

arXiv:1901.07031 DOI:10.1609/aaai.v33i01.3301590 Semantic Scholar

Metadata

arxiv preprintdataset

Abstract

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .

Summary

Irvin et al. (2019) introduce CheXpert, a large-scale chest radiograph dataset containing 224,316 images from 65,240 patients with automatically-generated labels for 14 observations extracted from radiology reports. The authors develop methods to handle label uncertainty inherent in radiograph interpretation and train convolutional neural networks to predict pathology presence. Their best model achieves performance exceeding that of board-certified radiologists on several pathologies (Cardiomegaly, Edema, Pleural Effusion) when evaluated on a consensus-annotated test set, and the dataset is released publicly as a benchmark for evaluating chest radiograph interpretation systems.

Cited by 1 page

Page	Type	Quality
AI-Human Hybrid Systems	Approach	91.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202643 KB

[1901.07031] CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 CheXpert: A Large Chest Radiograph Dataset 
 with Uncertainty Labels and Expert Comparison

 
 
 
Jeremy Irvin, 1,* 
Pranav Rajpurkar, 1,* 
Michael Ko, 1 
Yifan Yu, 1 
 Silviana Ciurea-Ilcus, 1 
Chris Chute, 1 
Henrik Marklund, 1 
Behzad Haghgoo, 1 
 Robyn Ball, 2 
Katie Shpanskaya, 3 
Jayne Seekins, 3 
David A. Mong, 3 
 Safwan S. Halabi, 3 
Jesse K. Sandberg, 3 
Ricky Jones, 3 
David B. Larson, 3 
 Curtis P. Langlotz, 3 
Bhavik N. Patel, 3 
Matthew P. Lungren, 3,† 
Andrew Y. Ng 1,† 
 1 Department of Computer Science, Stanford University
 2 Department of Medicine, Stanford University
 3 Department of Radiology, Stanford University
 * Equal contribution
 † Equal contribution
 {jirvin16, pranavsr}@cs.stanford.edu
 
 
 

 
 Abstract

 Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. 1 1 1 https://stanfordmlgroup.github.io/competitions/chexpert 

 
 
 Figure 1: The CheXpert task is to predict the probability of different observations from multi-view chest radiographs. 
 
 
 Introduction

 
 Chest radiography is the most common imaging examination globally, critical for screening, diagnosis, and management of many life threatening diseases. Automated chest radiograph interpretation at the level of practicing radiologists could provide substantial benefit in many medical settings, from improved workflow prioritization and clinical decision support to large-scale screening and global population health initiatives. For progress, there is a need for labeled datasets that (1) are large, (2) have strong reference standards, and (3) provide expert human performan

... (truncated, 43 KB total)

Resource ID: 9a233bff4729c023 | Stable ID: sid_rZkd54Li09