You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can you please explain are weights of the model same on all GPUs after first batch or not?
I mean, that we make copy of model on each GPU (at what line does it happen exactly?), and then compute gradients on all GPUs separately, with different slices of batch. It means, that each GPU will update its model weights in a different way. Or not, if all copies of model across GPUs are synced somehow. Does it happen somewhere inside Keras?
If looking at Tensorflow example with multiple GPUs, they calculate gradients on each GPU separately (for each slice), then average them, then update shared weights. Looking at variable scopes and their usages we can see, that model weights are shared across GPUs. But in case on Keras it's not obvious.
Hi!
Can you please explain are weights of the model same on all GPUs after first batch or not?
I mean, that we make copy of model on each GPU (at what line does it happen exactly?), and then compute gradients on all GPUs separately, with different slices of batch. It means, that each GPU will update its model weights in a different way. Or not, if all copies of model across GPUs are synced somehow. Does it happen somewhere inside Keras?
If looking at Tensorflow example with multiple GPUs, they calculate gradients on each GPU separately (for each slice), then average them, then update shared weights. Looking at variable scopes and their usages we can see, that model weights are shared across GPUs. But in case on Keras it's not obvious.
link to TF example: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py
The text was updated successfully, but these errors were encountered: