Skip to content

Latest commit

 

History

History
83 lines (55 loc) · 2.4 KB

README.md

File metadata and controls

83 lines (55 loc) · 2.4 KB

Instruct GPT vs. GPT 3 vs. Bloomberg GPT

Table 1: Models comparison
  # Parameters (billion) dataset (billion) Training
Instruct GPT (ChatGPT) 175 300 & fine-tuning UPT + RLHF
GPT 3 175 300 UPT
Bloomberg GPT 50 700 UPT
  • UPT: Unsupervised pre-training
  • RLHF: Reinforcement Learning with Human feedback

Compare GPT3 with Instruct GPT

img

  • SFT supervised fine tuning(SFT) model

Instruct GPT Turning dataset

img

  • SFT dataset: With labeler demonstrations used to train SFT models
  • RM dataset: labeler rankings of model outputs used to train Reward Model.
  • PPO dataset: Without any human labels with are used as inputs for RLHF fine-tuning.

Training hardware

  • Bloomberg: 64 × 8 A100 GPU, 53 days, 0.8 epoch (only 80% of the dataset)
  • GPT3: Estimated that it cost around $5M in compute time to train GPT-3. Using 1,024x A100 GPUs, researchers calculated that OpenAI could have trained GPT-3 in as little as 34 days. (ref)

Fine-tuning & RLHF & Alignment

  • Alignment: Generate output aligns with the human intend
  • Fine-tuning: Tune the model structure/parameters to align the pretrained model to
  • RLHF: One of the Fine-tuning techniques