-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using PEFT causes model to not predict EOS #1672
Comments
Good idea training on that dummy dataset to debug what's happening. Do you see any model progress at all or is the output basically static? Any logs you can share could be helpful. Regarding the |
Yes I should have specified that the training and validation loss both go down when training on my regular dataset (as well as the dummy dataset). Unfortunately I haven't tried training the model directly without SFTTrainer. I'm also a bit new to this library and don't fully understand their implementation. |
I see. Could you please paste your complete python code so that we can try to replicate the issue? @younesbelkada Can you see anything wrong with the shown code? |
This might be the problem. Try "all-linear" |
Note that this is only an issue if the model is not one of the pre-configured models. If there is no matching layer at all, we raise an error, so it shouldn't go unnoticed. |
Running into this issue myself with 'all-linear' and SFTTrainer using peft main, transformers 4.40.2, and trl 0.8.6. No PEFT and EOS is predicted fine. w/ PEFT and EOS is not predicted correctly. The prediction just continues until max_tokens is reached. |
@derekelewis If you have a minimal reproducer to share, that would be great. |
@BenjaminBossan see below. Tried to simplify as much as possible. Also uploaded fine-tuned models to hub w/ PEFT enabled & disabled. TRL seems to be having some issues w/ chat_templates & EOS in general (huggingface/trl#1412, huggingface/trl#1623, huggingface/trl#1578), but I think it is separate from what is going on here. PEFT enabled: https://huggingface.co/delewis/gemma-2b-peft-eos-issue Training script:
Test script:
Output of test script w/ PEFT enabled:
Output w/ PEFT disabled:
|
@derekelewis Thanks for the script. Unfortunately I could not run it due to memory constraints, but it's still helpful. I can spot 3 potential issues:
for p in trainer.model.parameters():
if p.requires_grad:
p.data = p.data.float() Also ensure that you enable |
Hello! Do you solve this problem? I have met the same problem when sft phi-1.5 ! |
same issue here it seems. |
@Vermeille If you could share a minimal reproducer, we could take a look, otherwise it's going to be hard for us to help. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
@BenjaminBossan How did you select those specific layers to finetune: target_modules=["q_proj", "v_proj", "down_proj", "lm_head", "embed_tokens"] |
@joann-alvarez When it comes to "q_proj", "v_proj", "down_proj", those are just standard linear layers that are commonly targeted. The reason why I suggested applying LoRA to the "lm_head" and "embed_tokens" (who have tied weights) is because gemma has a huge vocabulary size. This means that especially for the smaller variants (2b), the vocab makes up a big fraction of all parameters. Often when people extend the vocabulary, they will fully fine-tune the embedding (by adding it to |
System Info
I'm doing a LORA peft of GPT2 through trl and have noticed that my trained model assigns very low probability to the EOS token which causes it to alway generate the maximum number of tokens.
After trying a few different fixes I ran the code without the PEFT option and just used the base model. The problem resolved immediately.
To make the comparison clear I created a toy case with a dataset that contains the same datapoint ("Hello <|endoftext|>") repeatedly. I then overfit on this dataset with a small batch size for a few dozen iterations. To see the effect on the probability of generating the eos_token I inserted the following code fragment in my
compute_metrics
method:The basic full finetuning results in the EOS token probability converging to 1 almost immediately as the model memorizes the location of the EOS tokens. However if I just use TRL's code for a LORA PEFT the printed values remain close to zero and don't increase at all.
I've seen some references online suggesting that this could be caused by LORA not updating the model's embedding matrix. So I added the following change to the peft_config:
peft_config.modules_to_save = ["wte"]
. This doesn't have any effect on the results. I'm also doubtful this is the cause as when I run the supervised finetuning I don't see any change in the embedding matrix but get the desired results anyway.Any help would be appreciated as I would like to avoid a full finetuning but right now have no way of getting a functional model with a PEFT.
Who can help?
No response
Information
Tasks
examples
folderReproduction
Use the following model_config (note the PEFT parameters) and training arguments:
Create dataset:
Set up custom evaluation function:
Instantiate and run SFTTrainer
The
eos_probs
printed incompute_metrics
will be near-zeroExpected behavior
I would expect the above code to result in
eos_probs
values being nearly 1 after a few training iterations.The text was updated successfully, but these errors were encountered: