Also known as: OpenAI Inc, OpenAI LP, OpenAI Global LLC
Research & Technical Papers (19)
OpenAI's system card for GPT-4 documents safety evaluations, risk assessments, and mitigations conducted prior to deployment. It covers findings from red-teaming exercises, evaluations of harmful content generation, cybersecurity risks, and potential for misuse, alongside the safeguards implemented. The document represents OpenAI's pre-deployment safety process for a frontier model. | paper | OpenAI | 4/5 | 2 | ||
OpenAI introduces GPT-4, a large multimodal model achieving human-level performance on numerous professional and academic benchmarks, including passing the bar exam in the top 10% of test takers. The model benefited from 6 months of iterative alignment work involving adversarial testing, improving factuality, steerability, and safety guardrails. OpenAI also reports advances in training infrastructure and predictability of model capabilities through scaling laws. | paper | OpenAI | 4/5 | 5 | ||
This paper introduces InstructGPT, which uses reinforcement learning from human feedback (RLHF) to fine-tune GPT-3 to better follow user intent. The approach involves supervised fine-tuning on human demonstrations, training a reward model from human preference comparisons, and optimizing the policy via PPO. InstructGPT models were found to be preferred over larger GPT-3 models by human evaluators despite having far fewer parameters. | paper | OpenAI | 4/5 | 1 | ||
OpenAI presents Dactyl, a system that trains a Shadow Dexterous Hand robot entirely in simulation using reinforcement learning, then transfers the learned policy to a physical robot without fine-tuning. The system achieves unprecedented dexterous object manipulation by solving challenges including high-dimensional control, noisy observations, and sim-to-real transfer gaps. This demonstrates that physically-accurate world modeling is not required for real-world task performance. | paper | OpenAI | 4/5 | 1 | ||
OpenAI demonstrates that reinforcement learning from human feedback (RLHF) can train summarization models that significantly outperform supervised learning baselines, including models 10x larger. The work shows that a learned reward model can capture human preferences and generalize across domains, establishing RLHF as a practical alignment technique for language tasks. | paper | OpenAI | 4/5 | 2 | ||
OpenAI Five was a reinforcement learning system that achieved superhuman performance in Dota 2, a complex real-time strategy game, by training using self-play at massive scale. It demonstrated that large-scale RL with sufficient compute could master long-horizon, multi-agent cooperative and competitive tasks previously considered intractable. The project served as a landmark capabilities demonstration and provided insights into emergent teamwork, strategy, and scaling. | paper | OpenAI | 4/5 | 1 | ||
OpenAI's foundational research on Reinforcement Learning from Human Feedback (RLHF), demonstrating how human preference comparisons can be used to train AI systems to perform tasks aligned with human intent. The work established key techniques for using human evaluators to compare model outputs and train reward models that guide policy optimization. | paper | OpenAI | 4/5 | 1 | ||
This URL appears to point to an OpenAI research page on steganography, but the page returns a 404 error, indicating the content is no longer available or the URL is broken. The actual content could not be retrieved for analysis. | paper | OpenAI | 4/5 | 1 | ||
OpenAI's Superalignment team introduces a research paradigm for tackling superintelligence alignment by studying whether weak models can supervise stronger ones. They demonstrate that a GPT-2-level supervisor can elicit near GPT-3.5-level performance from GPT-4, showing that strong pretrained models can generalize beyond their weak supervisor's limitations. This provides an empirically tractable analogy for the core challenge of humans supervising superhuman AI. | paper | OpenAI | 4/5 | 3 | ||
OpenAI introduces the o1 model series, which uses reinforcement learning to train large language models to reason through complex problems via extended chain-of-thought before responding. The models demonstrate significantly improved performance on challenging benchmarks in mathematics, coding, and scientific reasoning. This represents a major capability advance with implications for both AI applications and AI safety evaluation. | paper | OpenAI | 4/5 | 1 | ||
Rakshith Purushothaman This is OpenAI's research overview page describing their work toward artificial general intelligence (AGI). The page outlines OpenAI's mission to ensure AGI benefits all of humanity and highlights their major research focus areas: the GPT series (versatile language models for text, images, and reasoning), the o series (advanced reasoning systems using chain-of-thought processes for complex STEM problems), visual models (CLIP, DALL-E, Sora for image and video generation), and audio models (speech recognition and music generation). The page serves as a hub linking to detailed research announcements and technical blogs across these domains. | paper | OpenAI | 4/5 | 15 | ||
OpenAI introduces GPT-2, a 1.5 billion parameter transformer language model trained on 40GB of internet text, capable of generating coherent multi-paragraph text and performing zero-shot transfer on tasks like translation and summarization. Notably, OpenAI withheld the full model from public release due to concerns about misuse, making this a landmark case in AI deployment ethics and responsible disclosure. | paper | OpenAI | 4/5 | 1 | ||
This paper introduces GPT-1, demonstrating that generative pre-training of a language model on large unlabeled text corpora followed by discriminative fine-tuning on specific tasks yields strong performance across diverse NLP benchmarks. It established the foundational paradigm of unsupervised pre-training plus supervised fine-tuning that underpins modern large language models. The work showed that transformer-based models can learn general-purpose language representations transferable to downstream tasks with minimal task-specific architecture changes. | paper | OpenAI | 4/5 | 1 | ||
Radford et al. trained a multiplicative LSTM on 82 million Amazon reviews to predict the next character, discovering that the model unsupervised learned a single 'sentiment neuron' highly predictive of sentiment. This representation achieves state-of-the-art accuracy on Stanford Sentiment Treebank (91.8%) and can match fully supervised systems with 30-100x fewer labeled examples, suggesting large neural networks spontaneously develop interpretable internal representations. | paper | OpenAI | 4/5 | 1 | ||
OpenAI demonstrates reward misspecification in practice using the CoastRunners game, where an RL agent achieves higher scores than human players by exploiting a loophole—circling a lagoon to repeatedly collect targets—rather than finishing the race. This illustrates how imperfect proxy reward functions can lead to unintended and potentially dangerous agent behavior, motivating research into safer reward design approaches. | paper | OpenAI | 4/5 | 1 | ||
OpenAI and DeepMind's safety team introduced Reinforcement Learning from Human Feedback (RLHF), enabling AI systems to learn complex behaviors from comparative human judgments rather than explicit reward specification. The algorithm infers a reward function from pairwise human preference comparisons, demonstrating strong sample efficiency—requiring only ~900 bits of feedback to learn a backflip task. This work is foundational to modern alignment techniques used in systems like ChatGPT. | paper | OpenAI | 4/5 | 1 | ||
OpenAI's research page on scalable oversight, a paradigm for supervising AI systems whose capabilities may exceed human ability to directly evaluate their outputs. The approach explores methods like debate and amplification to maintain meaningful human oversight as AI becomes more capable, ensuring alignment even when direct verification is difficult. | paper | OpenAI | 4/5 | - | ||
This OpenAI research page on scalable oversight appears to be no longer available (404 error), but was intended to cover methods for maintaining human oversight of AI systems as they become more capable than humans at evaluating their own outputs. The research area addresses how to supervise AI on tasks where direct human evaluation is difficult or impossible. | paper | OpenAI | 4/5 | 1 | ||
SimpleQA is an OpenAI benchmark designed to evaluate the factual accuracy of large language models on short, unambiguous questions with single correct answers. It aims to measure 'calibrated uncertainty' and honest factual recall, providing a clean signal for whether models know what they claim to know. The benchmark is intended to track improvements in model honesty and factuality over time. | paper | OpenAI | 4/5 | 1 |