Fix FusedRMSLinear backward compute #11095
Open
+3
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
Models
Description
修复 FusedRMSLinear backward 里的一处计算逻辑错误
因为 compute_fp8_linear 函数的 out 没有累加语义,是直接覆盖的,因此 h_grad 不能原地传入,应该在外面进行累加
收敛性验证
使用单机冷启进行验证,统计 loss 和全局梯度方差(检验是否梯度爆炸,一般应小于10)在 200 步内的变化:
其实好像修不修没区别…… loss几乎一样,全局梯度方差则是修了之后稍微大一点,因为把之前丢掉的一部分梯度加回来了,但是并没有影响到 loss 收敛
总之,说明被覆盖的这部分 h_grad 可能不是那么重要,但是逻辑上确实应该改