loss=None #10

Kris20xx · 2024-06-21T05:54:05Z

Thanks for your great work.When I want to train the classifier from scratch or train the explainer with the weight file you provided, I found that whenever I use the model of type base, the value of the loss function is None, or becomes None during the training process; while this does not happen with the model of type tiny, may I ask why this is so or how can I continue to do it? I have made no changes to your code.

Kris20xx · 2024-06-21T05:57:54Z

Looking forward to your reply, thanks!

chanwkimlab · 2024-06-22T17:42:30Z

Thank you for your interest in our work. I also encountered nan loss during the training on rare occasions as well, especially when using fp16. I think the issue is about engineering stuffs, such as numerical stability or exploding gradients, and it is tricky to debug. I'd recommend several tricks known to help with such issue, including using fp32 instead of fp16, enabling gradient clipping, and lowering learning rate. I think it can even depend on CUDA and PyTorch versions you are using as well– the latest versions might be more robust to such issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss=None #10

loss=None #10

Kris20xx commented Jun 21, 2024 •

edited

Loading

Kris20xx commented Jun 21, 2024

chanwkimlab commented Jun 22, 2024

loss=None #10

loss=None #10

Comments

Kris20xx commented Jun 21, 2024 • edited Loading

Kris20xx commented Jun 21, 2024

chanwkimlab commented Jun 22, 2024

Kris20xx commented Jun 21, 2024 •

edited

Loading