-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maybe a bug in the preprocess? #26
Comments
@Richar-Du Thanks a lot for the feedback! I do think you are correct. I remember I set the 1 for some reason (should be some annoying tokenizer mismatch problem).. It wasn't giving a trouble when training the first version of longchat, so I leave it here.. You are very right, actually I am meeting some potential bug in data processing in upgrading longchat with llama-2.. Actually need to debug more.. Let me know if you find the correct way to preprocess it. |
@DachengLi1 Thanks for your reply and I'm very glad to give some feedback:) Honestly speaking, I am still trying to debug it because I'm not quite familiar with it. I add |
@DachengLi1 I find that the only difference between fastchat and longchat is that longchat use RoPE, so if I add As far as I tried now, add |
Thanks for your awesome work so that the community can train LLM on very long context! However, I find that in the
preprocess
function, lineLongChat/longchat/train/fine_tune/train.py
Line 125 in a824bda
and line:
LongChat/longchat/train/fine_tune/train.py
Line 137 in a824bda
will set
target
to:[1, -100, -100, ...]
, with the first element is not ignored. I think Fastchat gives the correct code, which is first settingtarget[:cur_len] = IGNORE_TOKEN_ID
so the target will be[-100, -100, -100, ...]
. Am I right?@DachengLi1
The text was updated successfully, but these errors were encountered: