ICLR 2017

paper

2016·arXiv·arxiv.org/abs/1605.07725

Authors

Takeru Miyato·Andrew M. Dai·Ian Goodfellow

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Introduces adversarial and virtual adversarial training methods for text domain, extending robustness techniques to NLP which is relevant for understanding adversarial vulnerabilities and defenses in language models.

Paper Details

Citations

1,130

158 influential

Year

2016

arXiv:1605.07725 Semantic Scholar

Metadata

arxiv preprintprimary source

Abstract

Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making small perturbations to numerous entries of the input vector, which is inappropriate for sparse high-dimensional inputs such as one-hot word representations. We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself. The proposed method achieves state of the art results on multiple benchmark semi-supervised and purely supervised tasks. We provide visualizations and analysis showing that the learned word embeddings have improved in quality and that while training, the model is less prone to overfitting. Code is available at https://github.com/tensorflow/models/tree/master/research/adversarial_text.

Summary

This paper extends adversarial and virtual adversarial training to text domains by applying perturbations to word embeddings in recurrent neural networks rather than to sparse one-hot input representations. The authors demonstrate that this approach is more suitable for text data and achieves state-of-the-art results on both semi-supervised and supervised benchmark tasks. The method improves word embedding quality and reduces overfitting during training, with code made publicly available.

Cited by 1 page

Page	Type	Quality
FAR AI	Organization	76.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202645 KB

[1605.07725] Adversarial Training Methods for Semi-Supervised Text Classification 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 Adversarial Training Methods 
 for Semi-Supervised Text Classification

 
 
 
Takeru Miyato 1,2 , Andrew M Dai 2 , Ian Goodfellow 3 
 takeru.miyato@gmail.com, adai@google.com, ian@openai.com 
 1 Preferred Networks, Inc., ATR Cognitive Mechanisms Laboratories, Kyoto University 
 2 Google Brain 
 3 OpenAI 
 
 This work was done when the author was at Google Brain. 
 

 
 Abstract

 Adversarial training provides a means of regularizing supervised learning
algorithms while virtual adversarial training is able to extend
supervised learning algorithms to the semi-supervised setting.
However, both methods require making small perturbations to numerous
entries of the input vector, which is inappropriate for sparse high-dimensional
inputs such as one-hot word representations.
We extend adversarial and virtual adversarial training to the text domain
by applying perturbations to the word embeddings in a recurrent neural network rather than to the
original input itself.
The proposed method achieves state of the art results on multiple benchmark
semi-supervised and purely supervised tasks.
We provide visualizations and analysis showing that the learned word embeddings
have improved in quality and that while training, the model is less prone to overfitting.
Code is available at https://github.com/tensorflow/models/tree/master/research/adversarial_text .

 
 
 
 1 Introduction

 
 Adversarial examples are examples that are created by making small
perturbations to the input designed to significantly increase the loss
incurred by a machine learning model  Szegedy et al. ( 2014 ), Goodfellow et al. ( 2015 ) .
Several models, including state of the art convolutional neural networks,
lack the ability to classify adversarial examples correctly, sometimes even
when the adversarial perturbation is constrained to be so small that a human
observer cannot perceive it.
 Adversarial training 
is the process of training a model to correctly
classify both unmodified examples and adversarial examples.
It improves not only robustness to adversarial examples, but
also generalization performance for original examples.
Adversarial training requires the use of labels when training models that use
a supervised cost, because the label appears in the cost function that the
adversarial perturbation is designed to maximize.
 Virtual adversarial training   Miyato et al. ( 2016 ) extends the
idea of adversarial training to
the semi-supervised regime and unlabeled examples. This is done by regularizing the model so that given an example, the model will produce the same output
distribution as it produces on an adversarial perturbation
of that example.
Virtual adversarial training achieves good generalization performance for both
supervised and semi-supervised learning tasks.

 
 
 Previous work has primarily applied adversarial and virtual adversarial training to image

... (truncated, 45 KB total)

Resource ID: 7f846c18d60067fe | Stable ID: sid_J5hSJILw1Z