You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Models and optimizers are generally thought of as separate objects, although currently they are executed in the same context.
This might be appropriate as a second step after forward/backward pass separation.
The critical reasons why we want the optimizer separate are 1) if updated weights are returned by the same context returning training loss which we want to print then weights are being bussed too and from the GPU at every step and 2) XLA supports Send and Recv operations which would allow us to compute gradient updates while simultaneously bussing the next model inputs and labels to the GPU.
We should also support SGD, RMSProp, and Adam optimizers. It would also make sense to make batch normalization a part of this issue (as it is its own sort of custom optimizer).
The text was updated successfully, but these errors were encountered:
Models and optimizers are generally thought of as separate objects, although currently they are executed in the same context.
This might be appropriate as a second step after forward/backward pass separation.
The critical reasons why we want the optimizer separate are 1) if updated weights are returned by the same context returning training loss which we want to print then weights are being bussed too and from the GPU at every step and 2) XLA supports
Send
andRecv
operations which would allow us to compute gradient updates while simultaneously bussing the next model inputs and labels to the GPU.We should also support SGD, RMSProp, and Adam optimizers. It would also make sense to make batch normalization a part of this issue (as it is its own sort of custom optimizer).
The text was updated successfully, but these errors were encountered: