-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Cost NAN #27
Comments
@jiangqy |
I have met the same problem, and my batch size is 256, learning rate is 0.01. Do you have any ideas? |
@hma02 My batch size is 256 and learning rate is 0.01, too. |
@jiangqy @heipangpang The cost should be around 6.9 initially. The unbounded cost value maybe caused by gradient explosion. I got into similar situations when initializing a deep network with arrays of large variance and mean. Too large learning rates and batch sizes may result in strong gradient zigzag as well. Also do check the input images when loading them |
@hma02 |
@hma02 |
@heipangpang |
@hma02 |
@hma02 |
I had the same problem here. If "para_load" is set False, I could train it normally. But I think one of great contributions of this work is the parallel loading right? |
@heipangpang As you wrote |
Hi, I would like to train AlexNet on ImageNet. While after 20 iterations, training cost becomes nan.
Here are the details:
Should I set a smaller learning rate? Could you give me some suggestions?
Thank you~
The text was updated successfully, but these errors were encountered: