-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with make_parallel function #18
Comments
My two cents on the problem. I’ve been working on a g2.8xlarge as well, but I came across a similar issue. I manage to work around it by making the total number of samples divisible by the batch size. If you are multiplying your batch size by the number of gpus then your samples must be divisible by that equivalent batch size. For example, if you have 257000 samples and a per gpu batch of 16 (128 for 8 GPUs), then pass to the model a slice of 256000. I’m not sure if this is your case. Let me know how it goes. |
My initial conclusion was wrong. I had been running different configurations on g2.8xlarge and p2.8xlarge so that that the model could fit on the smaller cards K520. But strangely it seems to be somehow related to batch normalisation. It works only when I use batch normalisation. I couldn't figure out how exactly they are related yet. |
I got the following error while trying to use make_parallel function,
PS: The code works if the call to make_parallel is removed.
The text was updated successfully, but these errors were encountered: