You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work.When I want to train the classifier from scratch or train the explainer with the weight file you provided, I found that whenever I use the model of type base, the value of the loss function is None, or becomes None during the training process; while this does not happen with the model of type tiny, may I ask why this is so or how can I continue to do it? I have made no changes to your code.
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work. I also encountered nan loss during the training on rare occasions as well, especially when using fp16. I think the issue is about engineering stuffs, such as numerical stability or exploding gradients, and it is tricky to debug. I'd recommend several tricks known to help with such issue, including using fp32 instead of fp16, enabling gradient clipping, and lowering learning rate. I think it can even depend on CUDA and PyTorch versions you are using as well– the latest versions might be more robust to such issues.
Thanks for your great work.When I want to train the classifier from scratch or train the explainer with the weight file you provided, I found that whenever I use the model of type base, the value of the loss function is None, or becomes None during the training process; while this does not happen with the model of type tiny, may I ask why this is so or how can I continue to do it? I have made no changes to your code.
The text was updated successfully, but these errors were encountered: