使用colossalai原生的fp16会在clip_grad_norm_fp32函数中卡死 #2253
Unanswered
yhcc
asked this question in
Community | Q&A
Replies: 1 comment
-
应该是由于这个bug #2255 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
如下的函数在被FP16Optimizer调用时会导致卡死(会运行一段时间后才卡死,但每次卡死的迭代数量是一致的)。我使用了流水线并行+tensor并行,不知道有没有一些可能导致这个问题的猜测,这样我可以尝试debug一下。
ColossalAI/colossalai/utils/common.py
Line 279 in 8897b8f
Beta Was this translation helpful? Give feedback.
All reactions