Skip to content
Longterm Wiki
Back

Radford et al., "Learning to Generate Reviews and Discovering Sentiment" (OpenAI, 2017).

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

An early influential result showing that large neural networks trained on prediction tasks can spontaneously develop interpretable internal features; relevant to mechanistic interpretability research and debates about emergent representations in AI systems.

Metadata

Importance: 62/100blog postprimary source

Summary

Radford et al. trained a multiplicative LSTM on 82 million Amazon reviews to predict the next character, discovering that the model unsupervised learned a single 'sentiment neuron' highly predictive of sentiment. This representation achieves state-of-the-art accuracy on Stanford Sentiment Treebank (91.8%) and can match fully supervised systems with 30-100x fewer labeled examples, suggesting large neural networks spontaneously develop interpretable internal representations.

Key Points

  • A multiplicative LSTM trained purely on next-character prediction emergently developed a single neuron encoding almost all sentiment signal, without explicit supervision.
  • The learned representation achieved 91.8% accuracy on Stanford Sentiment Treebank, surpassing the previous best of 90.2% with a simple linear model on top.
  • The model can match supervised baselines with as few as 11-232 labeled examples, demonstrating extreme label efficiency via unsupervised pretraining.
  • The sentiment neuron can be manually overwritten to controllably steer generated text toward positive or negative sentiment.
  • Authors hypothesize this emergent interpretability is a general property of large neural networks trained to predict sequential inputs, not specific to their architecture.

Cited by 1 page

PageTypeQuality
Large Language ModelsCapability60.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20268 KB
OpenAI

April 6, 2017

[Publication](https://openai.com/research/index/publication/)

# Unsupervised sentiment neuron

We’ve developed an unsupervised system which learns an excellent representation of sentiment, despite being trained only to predict the next character in the text of Amazon reviews.

[(opens in a new window)](https://arxiv.org/abs/1704.01444) [(opens in a new window)](https://github.com/openai/generating-reviews-discovering-sentiment)

![Unsupervised Sentiment Neuron](https://images.ctfassets.net/kftzwdyauwt9/02933f53-8bbc-47d2-353690b73e22/cb1239b6bd5b5164e96d6571528443cf/unsupervised-sentiment-neuron.jpg?w=3840&q=90&fm=webp)

Illustration: Ludwig Pettersson

Loading…

Share

A [linear model⁠](https://openai.com/index/unsupervised-sentiment-neuron/#methodology) using this representation achieves state-of-the-art sentiment analysis accuracy on a small but extensively-studied dataset, the Stanford Sentiment Treebank (we get 91.8% accuracy versus the previous best of 90.2%), and can match the performance of previous supervised systems using 30-100x fewer labeled examples. Our representation also contains a distinct “ [sentiment neuron⁠](https://openai.com/index/unsupervised-sentiment-neuron/#sentimentneuron)” which contains almost all of the sentiment signal.

Our system beats other approaches on Stanford Sentiment Treebank while using dramatically less data.

![Graph of labelled training examples](https://images.ctfassets.net/kftzwdyauwt9/91470318-4e2e-453a-28b0fbd62477/d4c711efe2460409193dac290577786b/image00.png?w=3840&q=90&fm=webp)

The number of labeled examples it takes two variants of our model (the green and blue lines) to match fully supervised approaches, each trained with 6,920 examples (the dashed gray lines). Our L1-regularized model (pretrained in an unsupervised fashion on Amazon reviews) matches \[multichannel CNN\]( [https://arxiv.org/abs/1408.5882⁠(opens in a new window)](https://arxiv.org/abs/1408.5882)) performance with only 11 labeled examples, and state-of-the-art CT-LSTM Ensembles with 232 examples.

We were very surprised that our model learned an interpretable feature, and that simply [predicting⁠(opens in a new window)](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) the next character in Amazon reviews resulted in discovering the concept of sentiment. We believe the phenomenon is not specific to our model, but is instead a general property of certain large neural networks that are trained to predict the next step or dimension in their inputs.

## Methodology

We first trained a [multiplicative LSTM⁠(opens in a new window)](https://arxiv.org/abs/1609.07959) with 4,096 units on a corpus of 82 million Amazon reviews to predict the next character in a chunk of text. Training took one month across four NVIDIA Pascal GPUs, with our model processing 12,500 characters per second.

These 4,096 units (which are just a vector of floats) can be regarded as a feature vector representing the string read by the

... (truncated, 8 KB total)
Resource ID: 370658949c0fdca7 | Stable ID: NTk2MGRjZT