Maybe a bug in the preprocess? #26

Richar-Du · 2023-07-22T04:05:06Z

Thanks for your awesome work so that the community can train LLM on very long context! However, I find that in the preprocess function, line

LongChat/longchat/train/fine_tune/train.py

Line 125 in a824bda

cur_len = 1

and line:

LongChat/longchat/train/fine_tune/train.py

Line 137 in a824bda

target[cur_len : cur_len + instruction_len] = IGNORE_TOKEN_ID

will set target to: [1, -100, -100, ...], with the first element is not ignored. I think Fastchat gives the correct code, which is first setting target[:cur_len] = IGNORE_TOKEN_ID so the target will be [-100, -100, -100, ...]. Am I right?
@DachengLi1

The text was updated successfully, but these errors were encountered:

DachengLi1 · 2023-07-22T05:43:22Z

@Richar-Du Thanks a lot for the feedback! I do think you are correct. I remember I set the 1 for some reason (should be some annoying tokenizer mismatch problem).. It wasn't giving a trouble when training the first version of longchat, so I leave it here..

You are very right, actually I am meeting some potential bug in data processing in upgrading longchat with llama-2.. Actually need to debug more.. Let me know if you find the correct way to preprocess it.

Richar-Du · 2023-07-24T00:40:17Z

@DachengLi1 Thanks for your reply and I'm very glad to give some feedback:) Honestly speaking, I am still trying to debug it because I'm not quite familiar with it. I add target[:cur_len] = IGNORE_TOKEN_ID to change 1 to -100 but the training result is still abnormal. I am going to compare the code of fastchat and longchat and try to solve it.

Richar-Du · 2023-07-26T12:50:29Z

@DachengLi1 I find that the only difference between fastchat and longchat is that longchat use RoPE, so if I add replace_llama_with_condense(ratio=8) to the train_mem.py in fastchat, fastchat will be the same as longchat?

As far as I tried now, add replace_llama_with_condense(ratio=8) could make fastchat support long-context fine-tuning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe a bug in the preprocess? #26

Maybe a bug in the preprocess? #26

Richar-Du commented Jul 22, 2023

DachengLi1 commented Jul 22, 2023

Richar-Du commented Jul 24, 2023

Richar-Du commented Jul 26, 2023

Maybe a bug in the preprocess? #26

Maybe a bug in the preprocess? #26

Comments

Richar-Du commented Jul 22, 2023

DachengLi1 commented Jul 22, 2023

Richar-Du commented Jul 24, 2023

Richar-Du commented Jul 26, 2023