Skip to content

RuntimeError: merge_call called while defining a new graph or a tf.function.  #301

Closed
@innat

Description

@innat

System information.

  • Have I written custom code: Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
  • TensorFlow version (use command below): 2.6, 2.9

Describe the problem

I have code that works fine but gives the following error if I use with strategy.scope().

RuntimeError: merge_call called while defining a new graph or a tf.function. This can often happen if the function fn passed to strategy.run() contains a nested @tf.function, and the nested @tf.function contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function fn uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested tf.functions or control flow statements that may potentially cross a synchronization boundary, for example, wrap the fn passed to strategy.run or the entire strategy.run inside a tf.function or move the control flow out of fn

Describe the expected behavior

I think, It should work.

  • Do you want to contribute a PR? (yes/no): No.
  • If yes, please read this page for instructions

Standalone code to reproduce the issue

The code is for gradient accumulation techniques. Here it is done by overriding the trian_step with fit method. This code works fine (as said above) without with strategy.scope(). Now, I like to use it for multi-gpu cases, and so I use strategy scope but ened up the the above mentioned error.

Gist.

Follow-up Questions

  1. In the above gist, for multi-gpu training, do I need to adjust anythong for BATCH_SIZE = 32 * strategy.num_replicas_in_sync inside the train_step method? Or it will be handled auto?
  2. In the above gist, I use mixed precisoin technique, and so I also wrap (as described) optimizer with LossScaleOptimizer and use optimizer.get_scaled_loss(loss) and optimizer.get_unscaled_gradients(gradients).
    But the official documentation talks about normal fit and custom loop training cases. In case of custom loop, it's suggested to wrap the optimizer and scale the loss and gradient but what about the combination of fit and custom loop (overriding train_step)? Does it sill need to wrap the optimizer and scale the loss and gradient or it will be handled by the API?

Others: #107 cc @chenmoneygithub @nikitamaia @bhack

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions