additional optimizer parameters such as weight decay for tuning #169

mgz-dev · 2023-02-09T23:59:10Z

mgz-dev
Feb 9, 2023

In addition to tuning the learning rate for the optimizer, it can sometimes be helpful to adjust other parameters, such as the weight decay, to improve generalization, reduce overfitting, or to allow for more aggressive LR to speed up training. Both torch and bitsandbytes AdamW optimizers support custom beta1, beta2, and weight_decay parameters.

The short description for weight decay is that during training, the larger the magnitude of a weight the more it is penalize.

e.g. LoRA paper (https://arxiv.org/abs/2106.09685) weight decay is varied from 0.01 (the optimizer default) and 0.1.

While the default parameters should perform adequately for most situations, there may be datasets which benefit from tuning these other optimizer parameters.

The trade-off for exposing more configuration parameters is additional complexity to the end user. One possible option would be to have them processed at their default values unless specified otherwise. The inclusion of these parameters would be simple to implement, but at the same time I admit that there may not be many users interested in having access to them so wanted to open this topic up as a discussion.

rockerBOO · 2023-04-14T16:23:42Z

rockerBOO
Apr 14, 2023

These are exposed via optimizer_args --optimizer_args "weight_decay=0.1" "betas=(0.9, 0.999)". Generally if you look at the optimizer init function/class you can see the args you could pass to it. They are evaluated as python values after the = so write it like you would in python code.

1 reply

mgz-dev Apr 14, 2023
Author

Thanks for the reply! I had submitted a PR for this awhile back and forgot that this discussion was still open. I'll go ahead and close it out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

additional optimizer parameters such as weight decay for tuning #169

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

additional optimizer parameters such as weight decay for tuning #169

mgz-dev Feb 9, 2023

Replies: 1 comment · 1 reply

rockerBOO Apr 14, 2023

mgz-dev Apr 14, 2023 Author

mgz-dev
Feb 9, 2023

Replies: 1 comment 1 reply

rockerBOO
Apr 14, 2023

mgz-dev Apr 14, 2023
Author