Description
System information.
- Have I written custom code: Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
- TensorFlow version (use command below): 2.6, 2.9
Describe the problem
I have code that works fine but gives the following error if I use with strategy.scope()
.
RuntimeError:
merge_call
called while defining a new graph or a tf.function. This can often happen if the functionfn
passed tostrategy.run()
contains a nested@tf.function
, and the nested@tf.function
contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the functionfn
uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nestedtf.function
s or control flow statements that may potentially cross a synchronization boundary, for example, wrap thefn
passed tostrategy.run
or the entirestrategy.run
inside atf.function
or move the control flow out offn
Describe the expected behavior
I think, It should work.
- Do you want to contribute a PR? (yes/no): No.
- If yes, please read this page for instructions
Standalone code to reproduce the issue
The code is for gradient accumulation techniques. Here it is done by overriding the trian_step
with fit
method. This code works fine (as said above) without with strategy.scope()
. Now, I like to use it for multi-gpu cases, and so I use strategy scope but ened up the the above mentioned error.
Follow-up Questions
- In the above gist, for multi-gpu training, do I need to adjust anythong for
BATCH_SIZE = 32 * strategy.num_replicas_in_sync
inside thetrain_step
method? Or it will be handled auto? - In the above gist, I use mixed precisoin technique, and so I also wrap (as described) optimizer with
LossScaleOptimizer
and useoptimizer.get_scaled_loss(loss)
andoptimizer.get_unscaled_gradients(gradients)
.
But the official documentation talks about normalfit
and custom loop training cases. In case of custom loop, it's suggested to wrap the optimizer and scale the loss and gradient but what about the combination offit
and custom loop (overridingtrain_step
)? Does it sill need to wrap the optimizer and scale the loss and gradient or it will be handled by the API?
Others: #107 cc @chenmoneygithub @nikitamaia @bhack