traincal_classifier的clip_grad_norm的问题 #119

FancccyRay · 2019-09-25T13:46:25Z

廖老师您好，我在你的论文里面读到
The training procedure is quite unstable because the batch size is only 2 per GPU, and there are many outliers in the training set. Gradient clipping is therefore used in a later stage of training, i.e. if the norm of the gradient vector is larger than one, it would be normalized to one.
我的理解是在训练的后期阶段使用梯度裁剪，即如果梯度参数的范数大于1，则将其归一化为1。

我只在traincal_classifier中找到clip_grad_norm的(model.parameters(),1)，而这一句被注释掉了，请问是要自己设置到后期（另外，请问后期是指什么时候？）的时候再用这一句，还是说代码不小心给注释错了？

lmz123321 · 2020-09-24T01:29:12Z

@FancccyRay @lfz Have you solve this problem? I notice a relative answer in #21 .When shoul we use clip_grad_norm?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

traincal_classifier的clip_grad_norm的问题 #119

traincal_classifier的clip_grad_norm的问题 #119

FancccyRay commented Sep 25, 2019

lmz123321 commented Sep 24, 2020

traincal_classifier的clip_grad_norm的问题 #119

traincal_classifier的clip_grad_norm的问题 #119

Comments

FancccyRay commented Sep 25, 2019

lmz123321 commented Sep 24, 2020