Irvin et al. (2019)
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
CheXpert is a large-scale medical imaging dataset with 224,316 chest radiographs used for training deep learning models; relevant to AI safety for studying dataset quality, uncertainty handling, and safe deployment of medical AI systems.
Paper Details
Metadata
Abstract
Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .
Summary
Irvin et al. (2019) introduce CheXpert, a large-scale chest radiograph dataset containing 224,316 images from 65,240 patients with automatically-generated labels for 14 observations extracted from radiology reports. The authors develop methods to handle label uncertainty inherent in radiograph interpretation and train convolutional neural networks to predict pathology presence. Their best model achieves performance exceeding that of board-certified radiologists on several pathologies (Cardiomegaly, Edema, Pleural Effusion) when evaluated on a consensus-annotated test set, and the dataset is released publicly as a benchmark for evaluating chest radiograph interpretation systems.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI-Human Hybrid Systems | Approach | 91.0 |
Cached Content Preview
# CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
Jeremy Irvin,1,\*
Pranav Rajpurkar,1,\*
Michael Ko,1
Yifan Yu,1
Silviana Ciurea-Ilcus,1
Chris Chute,1
Henrik Marklund,1
Behzad Haghgoo,1
Robyn Ball,2
Katie Shpanskaya,3
Jayne Seekins,3
David A. Mong,3
Safwan S. Halabi,3
Jesse K. Sandberg,3
Ricky Jones,3
David B. Larson,3
Curtis P. Langlotz,3
Bhavik N. Patel,3
Matthew P. Lungren,3,†
Andrew Y. Ng1,†
1Department of Computer Science, Stanford University
2Department of Medicine, Stanford University
3Department of Radiology, Stanford University
\*Equal contribution
†Equal contribution
{jirvin16, pranavsr}@cs.stanford.edu
###### Abstract
Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.111https://stanfordmlgroup.github.io/competitions/chexpert
Figure 1: The CheXpert task is to predict the probability of different observations from multi-view chest radiographs.
## Introduction
Chest radiography is the most common imaging examination globally, critical for screening, diagnosis, and management of many life threatening diseases. Automated chest radiograph interpretation at the level of practicing radiologists could provide substantial benefit in many medical settings, from improved workflow prioritization and clinical decision support to large-scale screening and global population health initiatives. For progress, there is a need for labeled datasets that (1) are large, (2) have strong reference standards, and (3) provide expert human performance metrics for comparison.
In this work, we present CheXpert (Chest eXpert), a large dataset for chest radiograph interpretatio
... (truncated, 45 KB total)9a233bff4729c023 | Stable ID: OWUxZjhmNj