You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This discussion was converted from issue #948 on September 25, 2024 11:34.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Has this been raised before?
Description
背景:用internvl作为codebase进行多模态模型训练时发生此现象,拖慢pretrain的训练速度
一些细节与观察:
1)仅在pretrain时发生此现象,sft时耗时稳定(所以想不明白),已保证两者的训练参数基本一致(注:仅开放adapter训练)
2)经观察,和当前样本的seq长度似乎无关
3)进行细节的耗时打印发现异常耗时发生时,主要有decoderlayer引发,各层decoderlayer的耗时会呈现 快
->慢->极慢(1000倍耗时)->慢 的特点
异常耗时主要由极慢的那层贡献
4)耗时基本由self.attn贡献,sdpa和flashattn均如此
目前不清楚产生的原因及接下去的排查/解决思路,希望有了解的同学可以告知思路或原因,感谢~
Beta Was this translation helpful? Give feedback.
All reactions