GPT-4 Technical Report and Research Overview

web

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

This is OpenAI's official research page for GPT-4, a landmark frontier model; relevant for understanding capability thresholds, alignment techniques applied at scale, and the interplay between scaling and safety work.

Metadata

Importance: 72/100blog postprimary source

Summary

OpenAI introduces GPT-4, a large multimodal model achieving human-level performance on numerous professional and academic benchmarks, including passing the bar exam in the top 10% of test takers. The model benefited from 6 months of iterative alignment work involving adversarial testing, improving factuality, steerability, and safety guardrails. OpenAI also reports advances in training infrastructure and predictability of model capabilities through scaling laws.

Key Points

•GPT-4 passes simulated bar exam in top 10% of test takers, versus GPT-3.5's bottom 10%, illustrating rapid capability jumps across model generations.
•Six months of iterative alignment using adversarial testing and ChatGPT lessons improved factuality, steerability, and safety behaviors.
•Represents a landmark in multimodal deep learning scaling, accepting both text and image inputs.
•Rebuilt infrastructure with Azure enabled unprecedented training stability and accurate pre-deployment capability prediction.
•Demonstrates both the promise and challenge of frontier AI: significant capability gains alongside ongoing alignment and safety efforts.

Cited by 5 pages

Page	Type	Quality
Instrumental Convergence Framework	Analysis	60.0
AI Risk Cascade Pathways Model	Analysis	67.0
Scheming Likelihood Assessment	Analysis	61.0
EU AI Act	Policy	55.0
Sharp Left Turn	Risk	69.0

Cached Content Preview

HTTP 200Fetched May 17, 202636 KB

GPT-4 \| OpenAI

March 14, 2023

[Milestone](https://openai.com/research/index/milestone/)

# GPT‑4

[Read paper(opens in a new window)](https://arxiv.org/abs/2303.08774) [View system card(opens in a new window)](https://cdn.openai.com/papers/gpt-4-system-card.pdf) [Try on ChatGPT Plus(opens in a new window)](https://chatgpt.com/chat?openaicom-did=7e30423c-7aaf-4b06-92ee-f5f0aa4c0b84&openaicom_referred=true)

More Resources

[Try in Playground(opens in a new window)](https://platform.openai.com/playground) [Rewatch demo livestream(opens in a new window)](https://youtube.com/live/outcGtbnMuQ?feature=share) [Contribute to OpenAI Evals(opens in a new window)](https://github.com/openai/evals)

Share

We’ve created GPT‑4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT‑4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT‑3.5’s score was around the bottom 10%. We’ve spent 6 months iteratively [aligning⁠](https://openai.com/index/instruction-following/) GPT‑4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails.

Over the past two years, we rebuilt our entire deep learning stack and, together with Azure, co-designed a supercomputer from the ground up for our workload. A year ago, we trained GPT‑3.5 as a first “test run” of the system. We found and fixed some bugs and improved our theoretical foundations. As a result, our GPT‑4 training run was (for us at least!) unprecedentedly stable, becoming our first large model whose training performance we were able to accurately predict ahead of time. As we continue to focus on reliable scaling, we aim to hone our methodology to help us predict and prepare for future capabilities increasingly far in advance—something we view as critical for safety.

We are releasing GPT‑4’s text input capability via ChatGPT and the API (with a [waitlist⁠](https://openai.com/waitlist/gpt-4-api/)). To prepare the image input capability for wider availability, we’re collaborating closely with a [single partner⁠(opens in a new window)](https://www.bemyeyes.com/) to start. We’re also open-sourcing [OpenAI Evals⁠(opens in a new window)](https://github.com/openai/evals), our framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in our models to help guide further improvements.

## Capabilities

In a casual conversation, the distinction between GPT‑3.5 and GPT‑4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT‑4 is more reliable, creative, and able to handle much more nuanced ins

... (truncated, 36 KB total)

Resource ID: 9b255e0255d7dd86 | Stable ID: sid_FFc8nhQyDL