-
Notifications
You must be signed in to change notification settings - Fork 651
Add LearningRateMultiplier wrapper for optimizers #396
base: master
Are you sure you want to change the base?
Conversation
It seems there is some pep8 errors and that the code isn't compatible with python 2 because of super() . Super takes two arguments in python 2. Usually it's the class and |
You can find out more about the errors by looking at the travis logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on that. Many people asked for this feature, it's very welcome. Since your optimizer is quite special, (an optimizer inside an optimizer) we'll make sure that we minimize the amount of hackyness so that it works in as many cases as possible. See my comments. If you have any questions/problems, feel free to ask for help.
learning rate of the optimizer. | ||
|
||
Note: This is a wrapper and does not implement any | ||
optimization algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about two examples?
- One where you specify manually the learning rate by using the strings as keys
{'conv_1/kernel':0.5, 'conv_1/bias':0.1}
- One where you programmatically set the learning rates by iterating through the layers of the model (for big models this is useful). I suppose that it should be possible with a for loop and getting the
layer.name
as the key of the dictionary.
# Arguments | ||
optimizer: An optimizer class to be wrapped. | ||
lr_multipliers: Dictionary of the per layer factors. For | ||
example `optimizer={'conv_1/kernel':0.5, 'conv_1/bias':0.1}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: the keyword is lr_multipliers.
optimization algorithm. | ||
|
||
# Arguments | ||
optimizer: An optimizer class to be wrapped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think optimizer should be an optimizer instance, not an optimizer class. Let's minimize the hackyness.
class. | ||
""" | ||
def __init__(self, optimizer, lr_multipliers=None, **kwargs): | ||
self._class = optimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think underscores are needed.
optimizers._test_optimizer(opt1, target=0.95) | ||
|
||
mult = {'dense': 10} | ||
opt2 = LearningRateMultiplier(SGD, lr_multipliers=mult, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a second function test_lr_multiplier_layerwise
for this?
mult = {'dense': 10} | ||
opt2 = LearningRateMultiplier(SGD, lr_multipliers=mult, | ||
lr=0.001, momentum=0.9, nesterov=True) | ||
optimizers._test_optimizer(opt2, target=0.95) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also need a third test test_lr_multiplier_weightwise
where you use the format {'layer_name/weight_name': lr}
to ensure that all configuration work.
And a fourth test with a more complex optimizer (ADAM would be a good fit)
from keras_contrib.tests import optimizers | ||
from keras_contrib.optimizers import LearningRateMultiplier | ||
from keras.optimizers import SGD, Adam | ||
from keras.callbacks import LearningRateScheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import
if name.startswith('_'): | ||
super(LearningRateMultiplier, self).__setattr__(name, value) | ||
else: | ||
self._optimizer.__setattr__(name, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think __setattr__
and __getattr__
are needed. By calling the right super()
functions at the right places, everything should work. Ask me if you have any issues while removing them.
You'll likely have to have a lr
parameter which will be the same as self.optimizer.lr
since many callbacks expect a lr
attribute.
self._class = optimizer | ||
self._optimizer = optimizer(**kwargs) | ||
self._lr_multipliers = lr_multipliers or {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should call super()
at the end of the __init__
function. You can take a look at the source code of keras optimizers to see what happends.
return updates | ||
|
||
def get_config(self): | ||
config = {'optimizer': self._class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since optimizer will be an instance of the class optimizer, you should use the function serialize_keras_object
which will serialize the optimizer for you.
will there be updates on this? if not can I make a new PR that adds this class to keras-contrib? @gabrieldemarmiesse @stante , will be enabling DiscriminativeLearningRate in general but not specifically only learning rate multiplier. I propose three settings, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by layer, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by convolutional blocks/groups, and this learning rate multiplier |
Keras contrib is currently deprecated. Please redicted the PRs to tensorflow/addons. It would be really nice if you could add that @Dicksonchin93 , a lot of people are asking for this feature :) |
@gabrieldemarmiesse is there a reason why we shouldn't add this into keras directly? |
This was proposed a while back and rejected. The reason is that not enough
people use it to justify an API change of Keras. It's also not clear that
it's a best practice. Tensorflow addons was made exactly for this kind of
feature.
El jue., 9 ene. 2020 a las 19:52, Ee Kin (<[email protected]>)
escribió:
… @gabrieldemarmiesse <https://github.com/gabrieldemarmiesse> is there a
reason why we shouldn't add this into keras directly?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#396?email_source=notifications&email_token=ADCLMK4BPT7YKORCSYQ5VYLQ45W5ZA5CNFSM4GOQCUKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRLZWA#issuecomment-572701912>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCLMKYTSUBDF44MQUUQLLDQ45W5ZANCNFSM4GOQCUKA>
.
|
Summary
Optimizer have a model global learning rate. This PR adds a wrapper, which can be used with existing optimizers to provide a facility to specify different learning rates per layers in a network. The per layer learning rate is specified as a factor, which is multiplied with the learning rate of the wrapped optimizer. This wrapper can be used in the following way:
The example wrappes SGD and specifies
lr
andmomentum
for it. The layer which contain the string'dense_1'
has a multiplier of0.5
and the layer which contains the stringdense_2
has the multiplier of0.4
.Different multipliers for kernel and bias can be specified with:
Related Issues
There are issues regarding this topic in keras keras-team/keras#11934, keras-team/keras#7912 and partially keras-team/keras#5920