[1412.6572] Explaining and Harnessing Adversarial Examples - arXiv
paperCredibility Rating
3/5
Good(3)Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Metadata
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| AI Accident Risk Cruxes | Crux | 67.0 |
| Adversarial Training | Approach | 58.0 |
Cached Content Preview
HTTP 200Fetched Apr 30, 202653 KB
# Explaining and Harnessing Adversarial Examples
Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy
Google Inc., Mountain View, CA
{goodfellow,shlens,szegedy}@google.com
###### Abstract
Several machine learning models, including neural networks, consistently
misclassify adversarial examples—inputs formed
by applying small but intentionally worst-case perturbations to examples from the dataset,
such that the perturbed input results in the model outputting an incorrect
answer with high confidence.
Early attempts at explaining this phenomenon focused on nonlinearity and overfitting.
We argue instead that the primary cause of neural networks’ vulnerability to adversarial
perturbation is their linear nature.
This explanation is supported by new quantitative results while
giving the first explanation of the most intriguing fact about
them: their generalization across architectures and training sets.
Moreover, this view yields a simple and fast method of generating adversarial examples.
Using this approach to provide examples for adversarial training, we reduce the test
set error of a maxout network on the MNIST dataset.
## 1 Introduction
Szegedy et al. ( [2014b](https://ar5iv.labs.arxiv.org/html/1412.6572#bib.bib19 "")) made an intriguing discovery: several machine learning models,
including state-of-the-art neural networks, are vulnerable to adversarial examples.
That is, these machine learning models misclassify examples that are only slightly different
from correctly classified examples drawn from the data distribution.
In many cases, a wide variety of models with different architectures trained
on different subsets of the training data misclassify the same adversarial example.
This suggests that adversarial examples expose
fundamental blind spots in our training algorithms.
The cause of these adversarial examples was a mystery, and speculative explanations have
suggested it is due to extreme nonlinearity of deep neural networks, perhaps combined
with insufficient model averaging and insufficient regularization of the purely supervised
learning problem. We show that these speculative hypotheses are unnecessary.
Linear behavior in high-dimensional spaces is sufficient to cause adversarial examples.
This view enables us to design a fast method of generating adversarial examples that
makes adversarial training practical. We show that adversarial training can provide
an additional regularization benefit beyond that provided by using dropout (Srivastava et al., [2014](https://ar5iv.labs.arxiv.org/html/1412.6572#bib.bib17 "")) alone.
Generic regularization strategies such as dropout, pretraining,
and model averaging do not confer a significant reduction in a model’s vulnerability to
adversarial examples, but changing to nonlinear model families such as RBF networks can do so.
Our explanation suggests a fundamental tension between designing models that are easy to train due
to their linearity and designing models that use nonlinear effects t
... (truncated, 53 KB total)Resource ID:
d20b6a68747b55ae | Stable ID: sid_4QIT1gaaMg