ICLR 2017
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Introduces adversarial and virtual adversarial training methods for text domain, extending robustness techniques to NLP which is relevant for understanding adversarial vulnerabilities and defenses in language models.
Paper Details
Metadata
Abstract
Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making small perturbations to numerous entries of the input vector, which is inappropriate for sparse high-dimensional inputs such as one-hot word representations. We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself. The proposed method achieves state of the art results on multiple benchmark semi-supervised and purely supervised tasks. We provide visualizations and analysis showing that the learned word embeddings have improved in quality and that while training, the model is less prone to overfitting. Code is available at https://github.com/tensorflow/models/tree/master/research/adversarial_text.
Summary
This paper extends adversarial and virtual adversarial training to text domains by applying perturbations to word embeddings in recurrent neural networks rather than to sparse one-hot input representations. The authors demonstrate that this approach is more suitable for text data and achieves state-of-the-art results on both semi-supervised and supervised benchmark tasks. The method improves word embedding quality and reduces overfitting during training, with code made publicly available.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| FAR AI | Organization | 76.0 |
Cached Content Preview
# Adversarial Training Methods for Semi-Supervised Text Classification
Takeru Miyato1,2 , Andrew M Dai2, Ian Goodfellow3
takeru.miyato@gmail.com, adai@google.com, ian@openai.com
1 Preferred Networks, Inc., ATR Cognitive Mechanisms Laboratories, Kyoto University
2 Google Brain
3 OpenAI
This work was done when the author was at Google Brain.
###### Abstract
Adversarial training provides a means of regularizing supervised learning
algorithms while virtual adversarial training is able to extend
supervised learning algorithms to the semi-supervised setting.
However, both methods require making small perturbations to numerous
entries of the input vector, which is inappropriate for sparse high-dimensional
inputs such as one-hot word representations.
We extend adversarial and virtual adversarial training to the text domain
by applying perturbations to the word embeddings in a recurrent neural network rather than to the
original input itself.
The proposed method achieves state of the art results on multiple benchmark
semi-supervised and purely supervised tasks.
We provide visualizations and analysis showing that the learned word embeddings
have improved in quality and that while training, the model is less prone to overfitting.
Code is available at [https://github.com/tensorflow/models/tree/master/research/adversarial\_text](https://github.com/tensorflow/models/tree/master/research/adversarial_text "").
## 1 Introduction
Adversarial examples are examples that are created by making small
perturbations to the input designed to significantly increase the loss
incurred by a machine learning model Szegedy et al. ( [2014](https://ar5iv.labs.arxiv.org/html/1605.07725#bib.bib34 "")), Goodfellow et al. ( [2015](https://ar5iv.labs.arxiv.org/html/1605.07725#bib.bib8 "")).
Several models, including state of the art convolutional neural networks,
lack the ability to classify adversarial examples correctly, sometimes even
when the adversarial perturbation is constrained to be so small that a human
observer cannot perceive it.
Adversarial training
is the process of training a model to correctly
classify both unmodified examples and adversarial examples.
It improves not only robustness to adversarial examples, but
also generalization performance for original examples.
Adversarial training requires the use of labels when training models that use
a supervised cost, because the label appears in the cost function that the
adversarial perturbation is designed to maximize.
Virtual adversarial trainingMiyato et al. ( [2016](https://ar5iv.labs.arxiv.org/html/1605.07725#bib.bib27 "")) extends the
idea of adversarial training to
the semi-supervised regime and unlabeled examples. This is done by regularizing the model so that given an example, the model will produce the same output
distribution as it produces on an adversarial perturbation
of that example.
Virtual adversarial training achieves good generalization performance for both
supervised and semi-supervised lear
... (truncated, 56 KB total)7f846c18d60067fe | Stable ID: YTViMjZlND