Introducing ChatGPT
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
This is the original launch post for ChatGPT (November 2022), a landmark public deployment of an RLHF-trained model that brought AI safety and alignment techniques to mainstream attention and triggered widespread societal discussion about AI risks and governance.
Metadata
Summary
OpenAI's official launch announcement for ChatGPT, a conversational AI model fine-tuned from GPT-3.5 using Reinforcement Learning from Human Feedback (RLHF). ChatGPT is trained to follow instructions, admit mistakes, challenge incorrect premises, and decline inappropriate requests, representing a significant step in deploying aligned language models to the public.
Key Points
- •ChatGPT is trained using RLHF with supervised fine-tuning from human AI trainers playing both user and assistant roles.
- •A reward model is built from human-ranked comparisons of model outputs, then used to fine-tune via Proximal Policy Optimization (PPO).
- •The model is designed to admit mistakes, challenge incorrect premises, and reject inappropriate requests—early alignment-oriented behaviors.
- •ChatGPT is a sibling model to InstructGPT, extending instruction-following work into a conversational dialogue format.
- •Released as a free research preview to gather user feedback on strengths and weaknesses, marking a major public deployment milestone.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Multipolar Trap Dynamics Model | Analysis | 61.0 |
| AI-Driven Concentration of Power | Risk | 65.0 |
Cached Content Preview
OpenAI
November 30, 2022
[Product](https://openai.com/news/product-releases/)
# Introducing ChatGPT
[Try ChatGPT(opens in a new window)](https://chatgpt.com/) [Try ChatGPT for Work](https://openai.com/business/)
Loading…
Share
We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
ChatGPT is a sibling model to [InstructGPT](https://openai.com/index/instruction-following/), which is trained to follow an instruction in a prompt and provide a detailed response.
We are excited to introduce ChatGPT to get users’ feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free. Try it now at [chatgpt.com(opens in a new window)](https://chatgpt.com/).
### Samples
User
this code is not working like i expect — how do i fix it?
`resultWorkerErr := make(chan error)
defer close(resultWorkerErr)
go func() {
defer cancel()
resultWorkerErr <- b.resultWorker(ctx)
}()
err := b.worker(ctx)
cancel()
if err == nil {
return <-resultWorkerErr
}
return multierror.Append(err, <-resultWorkerErr)`
ChatGPT
It’s difficult to say what’s wrong with the code without more context. Can you provide more information about what the code is supposed to do and what isn’t working as expected? Also, is this the entire code or just a part of it?
User
just part of it — the error never surfaces. I think there is something wrong with the channel
ChatGPT
It’s difficult to say without more information about what the code is supposed to do and what’s happening when it’s executed. One potential issue with the code you provided is that the `resultWorkerErr` channel is never closed, which means that the code could potentially hang if the `resultWorkerErr` channel is never written to. This could happen if `b.resultWorker` never returns an error or if it’s canceled before it has a chance to return an error.
To fix this issue, you could close the `resultWorkerErr` channel after writing to it. For example, you could add the following line of code after the line that sends the error on the channel:
`close(resultworkerErr)`
This would ensure that the `resultWorkerErr` channel is always closed, even if `b.resultWorker` never returns an error. Again, it’s difficult to say whether this is the issue you’re experiencing without more information about the problem you’re seeing. Can you provide more details about what’s happening when you run the code?
## Methods
We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as [InstructGPT](https://openai.com/index/instruction-following/), but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gav
... (truncated, 13 KB total)5d0c50035bac37ed | Stable ID: YTU0M2U1NG