Understanding gradient flow #25

grishchenko-ivan · 2017-09-27T12:30:04Z

Hi!

Can you please explain are weights of the model same on all GPUs after first batch or not?
I mean, that we make copy of model on each GPU (at what line does it happen exactly?), and then compute gradients on all GPUs separately, with different slices of batch. It means, that each GPU will update its model weights in a different way. Or not, if all copies of model across GPUs are synced somehow. Does it happen somewhere inside Keras?

If looking at Tensorflow example with multiple GPUs, they calculate gradients on each GPU separately (for each slice), then average them, then update shared weights. Looking at variable scopes and their usages we can see, that model weights are shared across GPUs. But in case on Keras it's not obvious.

link to TF example: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding gradient flow #25

Understanding gradient flow #25

grishchenko-ivan commented Sep 27, 2017

Understanding gradient flow #25

Understanding gradient flow #25

Comments

grishchenko-ivan commented Sep 27, 2017