-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loss is nan during training HD-CNN #2
Comments
After the whole training phase, accuracy become < 0.01.
|
Hi, I observed a similar problem and I found that cuDNN was the one to blame. It introduced unknown bugs on CIFAR100 dataset. The way I fixed it is to disable cuDNN by setting 'USE_CUDNN := 0' in makefile.config. Can you try this? |
Thank you for your kind advice. In my setting, USE_CUDNN is 0 (precisely, commented). By the way, I changed the number of GPU used for training from 2 to 1, so the problem is solved (accuracy is 0.6). I think the part of multi-gpu have a slight problem. |
It might be a multi-gpu issue. Fortunately, on Cifar100 dataset, the training speed with a single GPU is fine. Thanks
|
Hi,
I saw this page: https://sites.google.com/site/homepagezhichengyan/home/hdcnn/code
and try train using CIFAR-100, but in training time, displayed loss is nan. but accuracy seems to be improved little by little. Could you kindly explain this?
The text was updated successfully, but these errors were encountered: