Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding gradient flow #25

Open
grishchenko-ivan opened this issue Sep 27, 2017 · 0 comments
Open

Understanding gradient flow #25

grishchenko-ivan opened this issue Sep 27, 2017 · 0 comments

Comments

@grishchenko-ivan
Copy link

Hi!

Can you please explain are weights of the model same on all GPUs after first batch or not?
I mean, that we make copy of model on each GPU (at what line does it happen exactly?), and then compute gradients on all GPUs separately, with different slices of batch. It means, that each GPU will update its model weights in a different way. Or not, if all copies of model across GPUs are synced somehow. Does it happen somewhere inside Keras?

If looking at Tensorflow example with multiple GPUs, they calculate gradients on each GPU separately (for each slice), then average them, then update shared weights. Looking at variable scopes and their usages we can see, that model weights are shared across GPUs. But in case on Keras it's not obvious.

link to TF example: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant