You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
廖老师您好,我在你的论文里面读到
The training procedure is quite unstable because the batch size is only 2 per GPU, and there are many outliers in the training set. Gradient clipping is therefore used in a later stage of training, i.e. if the norm of the gradient vector is larger than one, it would be normalized to one.
我的理解是在训练的后期阶段使用梯度裁剪,即如果梯度参数的范数大于1,则将其归一化为1。
廖老师您好,我在你的论文里面读到
The training procedure is quite unstable because the batch size is only 2 per GPU, and there are many outliers in the training set. Gradient clipping is therefore used in a later stage of training, i.e. if the norm of the gradient vector is larger than one, it would be normalized to one.
我的理解是在训练的后期阶段使用梯度裁剪,即如果梯度参数的范数大于1,则将其归一化为1。
我只在traincal_classifier中找到clip_grad_norm的(model.parameters(),1),而这一句被注释掉了,请问是要自己设置到后期(另外,请问后期是指什么时候?)的时候再用这一句,还是说代码不小心给注释错了?
The text was updated successfully, but these errors were encountered: