Skip to content
Longterm Wiki
Back

Radford et al., "Better Language Models and Their Implications" (OpenAI, 2019).

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

This GPT-2 release is historically significant not only as a capabilities milestone but as the first high-profile case of a major AI lab deliberately staging a model release over dual-use safety concerns, making it essential context for AI governance and deployment policy discussions.

Metadata

Importance: 82/100blog postprimary source

Summary

OpenAI introduces GPT-2, a 1.5 billion parameter transformer language model trained on 40GB of internet text, capable of generating coherent multi-paragraph text and performing zero-shot transfer on tasks like translation and summarization. Notably, OpenAI withheld the full model from public release due to concerns about misuse, making this a landmark case in AI deployment ethics and responsible disclosure.

Key Points

  • GPT-2 demonstrates strong zero-shot task performance (QA, summarization, translation) without task-specific training, showing emergent capabilities from scale.
  • OpenAI staged the model release, initially withholding the full 1.5B parameter model citing concerns about malicious use such as disinformation generation.
  • This staged release decision sparked significant debate in the AI community about responsible disclosure norms and dual-use risks of capable language models.
  • GPT-2 is a direct scale-up of GPT with 10x more parameters and 10x more training data, establishing scaling as a key driver of language model capability.
  • The release highlighted the tension between open science norms and safety concerns around powerful generative AI, influencing subsequent deployment policies.

Cited by 1 page

PageTypeQuality
Large Language ModelsCapability60.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202618 KB
OpenAI

February 14, 2019

[Milestone](https://openai.com/research/index/milestone/)

# Better language models and their implications

[Read paper(opens in a new window)](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) [View code(opens in a new window)](https://github.com/openai/gpt-2)

![Better Language Models](https://images.ctfassets.net/kftzwdyauwt9/8df9d1ca-5128-41d0-81527fd4752f/488482a9bebc41f8d72e1d0d423a5891/better-language-models.jpg?w=3840&q=90&fm=webp)

Illustration: Ben Barry

Loading…

Share

We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization—all without task-specific training.

Our model, called GPT‑2 (a successor to [GPT⁠](https://openai.com/index/language-unsupervised/)), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much [smaller model⁠(opens in a new window)](https://github.com/openai/gpt-2) for researchers to experiment with, as well as a [technical paper⁠(opens in a new window)](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

GPT‑2 is a large [transformer⁠(opens in a new window)](https://arxiv.org/abs/1706.03762)-based language model with 1.5 billion parameters, trained on a dataset[A](https://openai.com/index/better-language-models/#citation-bottom-A) of 8 million web pages. GPT‑2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT‑2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.

GPT‑2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT‑2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. On language tasks like question answering, reading comprehension, summarization, and translation, GPT‑2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient (unlabeled) data and compute.

## Samples

GPT‑2 generates synthetic text samples in response to the model being prime

... (truncated, 18 KB total)
Resource ID: 5b72ca40b22da25b | Stable ID: NmMxNGE1MT