Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe a bug in the preprocess? #26

Open
Richar-Du opened this issue Jul 22, 2023 · 3 comments
Open

Maybe a bug in the preprocess? #26

Richar-Du opened this issue Jul 22, 2023 · 3 comments

Comments

@Richar-Du
Copy link

Thanks for your awesome work so that the community can train LLM on very long context! However, I find that in the preprocess function, line


and line:
target[cur_len : cur_len + instruction_len] = IGNORE_TOKEN_ID

will set target to: [1, -100, -100, ...], with the first element is not ignored. I think Fastchat gives the correct code, which is first setting target[:cur_len] = IGNORE_TOKEN_ID so the target will be [-100, -100, -100, ...]. Am I right?
@DachengLi1

@DachengLi1
Copy link
Owner

@Richar-Du Thanks a lot for the feedback! I do think you are correct. I remember I set the 1 for some reason (should be some annoying tokenizer mismatch problem).. It wasn't giving a trouble when training the first version of longchat, so I leave it here..

You are very right, actually I am meeting some potential bug in data processing in upgrading longchat with llama-2.. Actually need to debug more.. Let me know if you find the correct way to preprocess it.

@Richar-Du
Copy link
Author

@DachengLi1 Thanks for your reply and I'm very glad to give some feedback:) Honestly speaking, I am still trying to debug it because I'm not quite familiar with it. I add target[:cur_len] = IGNORE_TOKEN_ID to change 1 to -100 but the training result is still abnormal. I am going to compare the code of fastchat and longchat and try to solve it.

@Richar-Du
Copy link
Author

@DachengLi1 I find that the only difference between fastchat and longchat is that longchat use RoPE, so if I add replace_llama_with_condense(ratio=8) to the train_mem.py in fastchat, fastchat will be the same as longchat?

As far as I tried now, add replace_llama_with_condense(ratio=8) could make fastchat support long-context fine-tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants