This repository has been archived by the owner on Nov 3, 2023. It is now read-only.
Replies: 1 comment 1 reply
-
We train models based on feedback with different types of model-specific architectures, including DIRECTOR and CRINGE. I would take a look at those projects to see what the data format would entail What do you mean by fine-tuned cheaper? CC @jxmsML who worked on the FITS paper |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
From the BB3 and FITS papers, it seems that there is some RLHF fine-tuning (it being all the rage - DeepSpeed, PEFT) of these models without this term being explicitly used (binary, text, module-based feedback etc). From the ParlAI documentation it's not clear to me how one would go about creating for example a binary feedback dataset or fine-tune relevant modules using it? What are the relevant models being fine-tuned in the FITS paper for example?
Example goal 1: create a binary feedback dataset from bot utterances to an existing dataset and fine-tune the model or sub modules with it.
Example goal 2: create a binary feedback dataset from a live chat and fine-tune the vanilla model (or other sub modules) with it.
The other question is with regard to recent advances in RLHF. It seems that the bigger OPT, and the BB3 3B (vanilla GPT-2?) models can be fine-tuned much cheaper using RLHF and LoRA adapters. Are you familiar with these techniques, and how applicable are they for your models?
Beta Was this translation helpful? Give feedback.
All reactions