You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.
Changing the number of workers in distributed training requires adjusting hyperparameters. https://arxiv.org/abs/1706.02677 proposed a linear scaling rule to adjust the learning rate based on the batch size
Pitch
ClassificationTask should have a flag (default True), that would rescale the learning rate based on the batch size. The task is a natural place to put this since we don't want all parameter schedulers to reimplement the same logic. We could consider having the same in the optimizer instead, but I have a sense it'll require more boilerplate.
Alternatives
Hydra (http://hydra.cc) would enable a different solution for this problem: the config file could have a "rescale" parameter for the learning rate, and we could use the "interpolation" feature to rescale by "1/{batch_size}", where batch_size is defined elsewhere in the config.
The text was updated successfully, but these errors were encountered:
🚀 Feature
Auto scale learning rate based on batch size
Motivation
Changing the number of workers in distributed training requires adjusting hyperparameters. https://arxiv.org/abs/1706.02677 proposed a linear scaling rule to adjust the learning rate based on the batch size
Pitch
ClassificationTask should have a flag (default True), that would rescale the learning rate based on the batch size. The task is a natural place to put this since we don't want all parameter schedulers to reimplement the same logic. We could consider having the same in the optimizer instead, but I have a sense it'll require more boilerplate.
Alternatives
Hydra (http://hydra.cc) would enable a different solution for this problem: the config file could have a "rescale" parameter for the learning rate, and we could use the "interpolation" feature to rescale by "1/{batch_size}", where batch_size is defined elsewhere in the config.
The text was updated successfully, but these errors were encountered: