Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Optimizer initialization should set rescale_grad appropriately #210

Open
kalyc opened this issue Dec 18, 2018 · 0 comments
Open

Optimizer initialization should set rescale_grad appropriately #210

kalyc opened this issue Dec 18, 2018 · 0 comments

Comments

@kalyc
Copy link

kalyc commented Dec 18, 2018

This is potentially an ask for addressing at the Module API level, but it is more obvious with the Keras integration.

When creating an instance of mx.optimizers.Optimizer if the value of rescale_grad is not specified, the default value of 1.0 has a significant impact on training. In fact, this is pointed out as a warning in the logs, when the optimizer is initialized.

/usr/local/lib/python2.7/site-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.0078125). Is this intended?
  force_init=force_init)

Since the MXNet implementations of the Keras optimizers, essentially delegate to the Module versions, this parameter should likely be configured to the normalized value, as it is not obvious from the Keras API. It is possible to provide rescale_grad as an additional argument, but that requires the user to know some of the details of both frameworks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant