Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss=None #10

Open
Kris20xx opened this issue Jun 21, 2024 · 2 comments
Open

loss=None #10

Kris20xx opened this issue Jun 21, 2024 · 2 comments

Comments

@Kris20xx
Copy link

Kris20xx commented Jun 21, 2024

Thanks for your great work.When I want to train the classifier from scratch or train the explainer with the weight file you provided, I found that whenever I use the model of type base, the value of the loss function is None, or becomes None during the training process; while this does not happen with the model of type tiny, may I ask why this is so or how can I continue to do it? I have made no changes to your code.
loss-none

@Kris20xx
Copy link
Author

Looking forward to your reply, thanks!

@chanwkimlab
Copy link
Collaborator

Thank you for your interest in our work. I also encountered nan loss during the training on rare occasions as well, especially when using fp16. I think the issue is about engineering stuffs, such as numerical stability or exploding gradients, and it is tricky to debug. I'd recommend several tricks known to help with such issue, including using fp32 instead of fp16, enabling gradient clipping, and lowering learning rate. I think it can even depend on CUDA and PyTorch versions you are using as well– the latest versions might be more robust to such issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants