You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use DP-SGD with MultiSteps to train Bart.
Normally, I can only use batch size = 8 for DP-SGD on A100 80 GB, so gradient accumulation would be a good choice.
I follow the MultiSteps tutorial, and it works with SGD but not with DP-SGD.
Here is part of my stack trace:
Hi,
I'm trying to use DP-SGD with MultiSteps to train Bart.
Normally, I can only use batch size = 8 for DP-SGD on A100 80 GB, so gradient accumulation would be a good choice.
I follow the MultiSteps tutorial, and it works with SGD but not with DP-SGD.
Here is part of my stack trace:
For example, in
final_logits_bias
, my grad isShapedArray(float32[8,1,250027])
andmulti_state_when_skip
isShapedArray(float32[1,250027])
Thanks.
The text was updated successfully, but these errors were encountered: