Auto scale learning rate based on batch size #287

vreis · 2019-12-05T16:10:47Z

🚀 Feature

Auto scale learning rate based on batch size

Motivation

Changing the number of workers in distributed training requires adjusting hyperparameters. https://arxiv.org/abs/1706.02677 proposed a linear scaling rule to adjust the learning rate based on the batch size

Pitch

ClassificationTask should have a flag (default True), that would rescale the learning rate based on the batch size. The task is a natural place to put this since we don't want all parameter schedulers to reimplement the same logic. We could consider having the same in the optimizer instead, but I have a sense it'll require more boilerplate.

Alternatives

Hydra (http://hydra.cc) would enable a different solution for this problem: the config file could have a "rescale" parameter for the learning rate, and we could use the "interpolation" feature to rescale by "1/{batch_size}", where batch_size is defined elsewhere in the config.

omry · 2020-01-09T22:19:23Z

Interpolation does not support arithmetic operations (there is an enhancement request in OmegaConf that I will consider in the future).

For now, you could use to get the batch size into the model, and do the auto scaling in code.

model:
   params:
      ...
      batch_size: ${batch_size}

and do the division in the code.

vreis added the enhancement New feature or request label Dec 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto scale learning rate based on batch size #287

Auto scale learning rate based on batch size #287

vreis commented Dec 5, 2019

omry commented Jan 9, 2020

Auto scale learning rate based on batch size #287

Auto scale learning rate based on batch size #287

Comments

vreis commented Dec 5, 2019

🚀 Feature

Motivation

Pitch

Alternatives

omry commented Jan 9, 2020