# Parameters (billion) | dataset (billion) | Training | |
---|---|---|---|
Instruct GPT (ChatGPT) | 175 | 300 & fine-tuning | UPT + RLHF |
GPT 3 | 175 | 300 | UPT |
Bloomberg GPT | 50 | 700 | UPT |
- UPT: Unsupervised pre-training
- RLHF: Reinforcement Learning with Human feedback
- SFT supervised fine tuning(SFT) model
- SFT dataset: With labeler demonstrations used to train SFT models
- RM dataset: labeler rankings of model outputs used to train Reward Model.
- PPO dataset: Without any human labels with are used as inputs for RLHF fine-tuning.
- Bloomberg: 64 × 8 A100 GPU, 53 days, 0.8 epoch (only 80% of the dataset)
- GPT3: Estimated that it cost around $5M in compute time to train GPT-3. Using 1,024x A100 GPUs, researchers calculated that OpenAI could have trained GPT-3 in as little as 34 days. (ref)
- Alignment: Generate output aligns with the human intend
- Fine-tuning: Tune the model structure/parameters to align the pretrained model to
- RLHF: One of the Fine-tuning techniques