Optimizer initialization should set rescale_grad appropriately #210

kalyc · 2018-12-18T18:19:48Z

This is potentially an ask for addressing at the Module API level, but it is more obvious with the Keras integration.

When creating an instance of mx.optimizers.Optimizer if the value of rescale_grad is not specified, the default value of 1.0 has a significant impact on training. In fact, this is pointed out as a warning in the logs, when the optimizer is initialized.

/usr/local/lib/python2.7/site-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.0078125). Is this intended?
  force_init=force_init)

Since the MXNet implementations of the Keras optimizers, essentially delegate to the Module versions, this parameter should likely be configured to the normalized value, as it is not obvious from the Keras API. It is possible to provide rescale_grad as an additional argument, but that requires the user to know some of the details of both frameworks.

The text was updated successfully, but these errors were encountered:

kalyc added enhancement feature request labels Dec 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer initialization should set rescale_grad appropriately #210

Optimizer initialization should set rescale_grad appropriately #210

kalyc commented Dec 18, 2018

Optimizer initialization should set rescale_grad appropriately #210

Optimizer initialization should set rescale_grad appropriately #210

Comments

kalyc commented Dec 18, 2018