You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is easy to see that the second half heads (25439, 33631, 34682, 38778, 46970, 50878) shares the same interval with the first half heads (0, 8192, 9243, 13339, 21531, 25439). In another words, the second half heads do not shift the group size.
Could you please fix this bug?
The text was updated successfully, but these errors were encountered:
First, I ran the commands as follows:
The variable
cu_q_lens
beforeflash_attn_varlen_qkvpacked_func
is the as follows:It seems OK, the group size is 8192 and the heads in the second half is shifted 4096 (13339-9243).
However, when I set
per_device_train_batch_size=2
, and run the command as follows:the variable
cu_q_lens
after functionunpad_input
is as follows:In the end, the final
cu_q_lens
beforeflash_attn_varlen_qkvpacked_func
is:It is easy to see that the second half heads (
25439, 33631, 34682, 38778, 46970, 50878
) shares the same interval with the first half heads (0, 8192, 9243, 13339, 21531, 25439
). In another words, the second half heads do not shift the group size.Could you please fix this bug?
The text was updated successfully, but these errors were encountered: