Instruct GPT vs. GPT 3 vs. Bloomberg GPT

Table 1: Models comparison

Compare GPT3 with Instruct GPT

SFT dataset: With labeler demonstrations used to train SFT models
RM dataset: labeler rankings of model outputs used to train Reward Model.
PPO dataset: Without any human labels with are used as inputs for RLHF fine-tuning.

Bloomberg: 64 × 8 A100 GPU, 53 days, 0.8 epoch (only 80% of the dataset)
GPT3: Estimated that it cost around $5M in compute time to train GPT-3. Using 1,024x A100 GPUs, researchers calculated that OpenAI could have trained GPT-3 in as little as 34 days. (ref)

Alignment: Generate output aligns with the human intend
Fine-tuning: Tune the model structure/parameters to align the pretrained model to
RLHF: One of the Fine-tuning techniques

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
cows		cows
include		include
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md