Skip to content
Longterm Wiki
Back

ICLR 2017

paper

Authors

Takeru Miyato·Andrew M. Dai·Ian Goodfellow

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Introduces adversarial and virtual adversarial training methods for text domain, extending robustness techniques to NLP which is relevant for understanding adversarial vulnerabilities and defenses in language models.

Paper Details

Citations
1,130
158 influential
Year
2016

Metadata

arxiv preprintprimary source

Abstract

Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making small perturbations to numerous entries of the input vector, which is inappropriate for sparse high-dimensional inputs such as one-hot word representations. We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself. The proposed method achieves state of the art results on multiple benchmark semi-supervised and purely supervised tasks. We provide visualizations and analysis showing that the learned word embeddings have improved in quality and that while training, the model is less prone to overfitting. Code is available at https://github.com/tensorflow/models/tree/master/research/adversarial_text.

Summary

This paper extends adversarial and virtual adversarial training to text domains by applying perturbations to word embeddings in recurrent neural networks rather than to sparse one-hot input representations. The authors demonstrate that this approach is more suitable for text data and achieves state-of-the-art results on both semi-supervised and supervised benchmark tasks. The method improves word embedding quality and reduces overfitting during training, with code made publicly available.

Cited by 1 page

PageTypeQuality
FAR AIOrganization76.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202656 KB
# Adversarial Training Methods    for Semi-Supervised Text Classification

Takeru Miyato1,2 , Andrew M Dai2, Ian Goodfellow3

takeru.miyato@gmail.com, adai@google.com, ian@openai.com

1 Preferred Networks, Inc., ATR Cognitive Mechanisms Laboratories, Kyoto University

2 Google Brain

3 OpenAI

This work was done when the author was at Google Brain.

###### Abstract

Adversarial training provides a means of regularizing supervised learning
algorithms while virtual adversarial training is able to extend
supervised learning algorithms to the semi-supervised setting.
However, both methods require making small perturbations to numerous
entries of the input vector, which is inappropriate for sparse high-dimensional
inputs such as one-hot word representations.
We extend adversarial and virtual adversarial training to the text domain
by applying perturbations to the word embeddings in a recurrent neural network rather than to the
original input itself.
The proposed method achieves state of the art results on multiple benchmark
semi-supervised and purely supervised tasks.
We provide visualizations and analysis showing that the learned word embeddings
have improved in quality and that while training, the model is less prone to overfitting.
Code is available at [https://github.com/tensorflow/models/tree/master/research/adversarial\_text](https://github.com/tensorflow/models/tree/master/research/adversarial_text "").

## 1 Introduction

Adversarial examples are examples that are created by making small
perturbations to the input designed to significantly increase the loss
incurred by a machine learning model Szegedy et al. ( [2014](https://ar5iv.labs.arxiv.org/html/1605.07725#bib.bib34 "")), Goodfellow et al. ( [2015](https://ar5iv.labs.arxiv.org/html/1605.07725#bib.bib8 "")).
Several models, including state of the art convolutional neural networks,
lack the ability to classify adversarial examples correctly, sometimes even
when the adversarial perturbation is constrained to be so small that a human
observer cannot perceive it.
Adversarial training
is the process of training a model to correctly
classify both unmodified examples and adversarial examples.
It improves not only robustness to adversarial examples, but
also generalization performance for original examples.
Adversarial training requires the use of labels when training models that use
a supervised cost, because the label appears in the cost function that the
adversarial perturbation is designed to maximize.
Virtual adversarial trainingMiyato et al. ( [2016](https://ar5iv.labs.arxiv.org/html/1605.07725#bib.bib27 "")) extends the
idea of adversarial training to
the semi-supervised regime and unlabeled examples. This is done by regularizing the model so that given an example, the model will produce the same output
distribution as it produces on an adversarial perturbation
of that example.
Virtual adversarial training achieves good generalization performance for both
supervised and semi-supervised lear

... (truncated, 56 KB total)
Resource ID: 7f846c18d60067fe | Stable ID: YTViMjZlND