Research by Taori et al. (2020)
webCredibility Rating
Gold standard. Rigorous peer review, high editorial standards, and strong institutional reputation.
Rating inherited from publication venue: NeurIPS
A key empirical paper for understanding the limits of current robustness methods in ML; relevant to AI safety discussions about whether models trained on standard benchmarks will generalize reliably to real-world deployment conditions.
Metadata
Summary
Taori et al. systematically evaluate how well improvements in ImageNet accuracy transfer to robustness against natural distribution shifts across seven test datasets. They find that most interventions—including adversarial training and data augmentation—do not improve effective robustness beyond what baseline accuracy predicts, with the notable exception of training on larger and more diverse datasets.
Key Points
- •Introduces the concept of 'effective robustness' to distinguish genuine robustness gains from accuracy-correlated improvements on distribution-shifted datasets.
- •Evaluates 204 ImageNet models across 7 natural distribution shift benchmarks, finding a near-linear relationship between ImageNet accuracy and shifted-distribution accuracy.
- •Most popular robustness interventions (adversarial training, data augmentation, self-supervised pre-training) fail to improve effective robustness significantly.
- •Training on larger or more diverse datasets (e.g., ImageNet-21k, JFT) is one of the few approaches that yields genuine robustness improvements.
- •Highlights a fundamental gap between robustness to adversarial perturbations and robustness to natural distribution shifts in ML systems.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Distributional Shift | Risk | 91.0 |
Cached Content Preview
# Measuring Robustness to Natural Distribution Shifts in Image Classification
Rohan Taori UC Berkeley
Achal Dave CMU
Vaishaal Shankar UC Berkeley
Nicholas Carlini Google Brain
Benjamin Recht UC Berkeley
Ludwig Schmidt UC Berkeley
# Abstract
We study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets. Most research on robustness focuses on synthetic image perturbations (noise, simulated weather artifacts, adversarial examples, etc.), which leaves open how robustness on synthetic distribution shift relates to distribution shift arising in real data. Informed by an evaluation of 204 ImageNet models in 213 different test conditions, we find that there is often little to no transfer of robustness from current synthetic to natural distribution shift. Moreover, most current techniques provide no robustness to the natural distribution shifts in our testbed. The main exception is training on larger and more diverse datasets, which in multiple cases increases robustness, but is still far from closing the performance gaps. Our results indicate that distribution shifts arising in real data are currently an open research problem. We provide our testbed and data as a resource for future work at [https://modestyachts.github.io/imagenet-testbed/](https://modestyachts.github.io/imagenet-testbed/).
# 1 Introduction
Reliable classification under distribution shift is still out of reach for current machine learning \[65, 68, 91\]. As a result, the research community has proposed a wide range of evaluation protocols that go beyond a single, static test set. Common examples include noise corruptions \[33, 38\], spatial transformations \[28, 29\], and adversarial examples \[5, 84\]. Encouragingly, the past few years have seen substantial progress in robustness to these distribution shifts, e.g., see \[13, 28, 34, 55, 57, 66, 93, 96, 105, 114, 115\] among many others. However, this progress comes with an important limitation: all of the aforementioned distribution shifts are synthetic: the test examples are derived from well-characterized image modifications at the pixel level.
Synthetic distribution shifts are a good starting point for experiments since they are precisely defined and easy to apply to arbitrary images. However, classifiers ultimately must be robust to distribution shifts arising naturally in the real world. These distribution shifts may include subtle changes in scene compositions, object types, lighting conditions, and many others. Importantly, these variations are not precisely defined because they have not been created artificially. The hope is that an ideal robust classifier is still robust to such natural distribution shifts.
In this paper, we investigate how robust current machine learning techniques are to distribution shift arising naturally from real image data without synthetic modifications. To this end, we conduct a comprehensive experimental study in the context of ImageNet
... (truncated, 69 KB total)851b9b69a081f6b0 | Stable ID: ZTdkMDdmZD