ParlAI - Creating datasets and fine-tune with RLHF #5027

krisstud · 2023-05-08T07:11:12Z

krisstud
May 8, 2023

From the BB3 and FITS papers, it seems that there is some RLHF fine-tuning (it being all the rage - DeepSpeed, PEFT) of these models without this term being explicitly used (binary, text, module-based feedback etc). From the ParlAI documentation it's not clear to me how one would go about creating for example a binary feedback dataset or fine-tune relevant modules using it? What are the relevant models being fine-tuned in the FITS paper for example?

Example goal 1: create a binary feedback dataset from bot utterances to an existing dataset and fine-tune the model or sub modules with it.

Example goal 2: create a binary feedback dataset from a live chat and fine-tune the vanilla model (or other sub modules) with it.

The other question is with regard to recent advances in RLHF. It seems that the bigger OPT, and the BB3 3B (vanilla GPT-2?) models can be fine-tuned much cheaper using RLHF and LoRA adapters. Are you familiar with these techniques, and how applicable are they for your models?

klshuster · 2023-05-09T15:58:55Z

klshuster
May 9, 2023

We train models based on feedback with different types of model-specific architectures, including DIRECTOR and CRINGE. I would take a look at those projects to see what the data format would entail

What do you mean by fine-tuned cheaper?

CC @jxmsML who worked on the FITS paper

1 reply

krisstud May 15, 2023
Author

Thank you, i will have a look at those. I was imprecise. I meant that the cost of fine-tuning in dollar terms are much lower with LoRA due to the lower hardware requirements. With some performance loss vs regular fine-tuning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParlAI - Creating datasets and fine-tune with RLHF #5027

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

ParlAI - Creating datasets and fine-tune with RLHF #5027

krisstud May 8, 2023

Replies: 1 comment · 1 reply

klshuster May 9, 2023

krisstud May 15, 2023 Author

krisstud
May 8, 2023

Replies: 1 comment 1 reply

klshuster
May 9, 2023

krisstud May 15, 2023
Author