Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traincal_classifier的clip_grad_norm的问题 #119

Open
FancccyRay opened this issue Sep 25, 2019 · 1 comment
Open

traincal_classifier的clip_grad_norm的问题 #119

FancccyRay opened this issue Sep 25, 2019 · 1 comment

Comments

@FancccyRay
Copy link

廖老师您好,我在你的论文里面读到
The training procedure is quite unstable because the batch size is only 2 per GPU, and there are many outliers in the training set. Gradient clipping is therefore used in a later stage of training, i.e. if the norm of the gradient vector is larger than one, it would be normalized to one.
我的理解是在训练的后期阶段使用梯度裁剪,即如果梯度参数的范数大于1,则将其归一化为1。

我只在traincal_classifier中找到clip_grad_norm的(model.parameters(),1),而这一句被注释掉了,请问是要自己设置到后期(另外,请问后期是指什么时候?)的时候再用这一句,还是说代码不小心给注释错了?

@lmz123321
Copy link

@FancccyRay @lfz Have you solve this problem? I notice a relative answer in #21 .When shoul we use clip_grad_norm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants