You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A common practice in machine learning is to take a pre-trained model and fine-tune it on a particular dataset. This typically involves freezing the weights in some layers while fitting the output layer(s) on the new data.
Unfortunately, this functionally appears to be incompatible with the current implementation of the ToDevice callback based on the following code:
This essentially means that learner.params is set to the parameters of the full model at the start of each epoch. Thus, even if we try to freeze the layers manually with Flux.freeze!(learner.params.layers[1:end-1]), this will be undone by ToDevice.
Possible Implementation
One solution that would work with Flux's new explicit optimizers would be to create a callback to freeze layers after ToDevice is executed. An example is given below:
However, perhaps we should consider whether it's necessary for ToDevice to move the model to the GPU at the start of every epoch. Maybe we could extend the Callback interface to allow for some one-time setup code to run before the first epoch is executed?
The text was updated successfully, but these errors were encountered:
I think this issue may also be related to #148. In particular, the memory leak appears to be caused by ToDevice resetting the optimizer in each epoch. We could potentially kill two birds with one stone by changing this behaviour.
Any update on this? Also, I'd really appreciate if the potential implementation above is turned into a complete example to build upon (for users like me who know nothing about the internals of FluxTraining.jl). Thanks!
Motivation and description
A common practice in machine learning is to take a pre-trained model and fine-tune it on a particular dataset. This typically involves freezing the weights in some layers while fitting the output layer(s) on the new data.
Unfortunately, this functionally appears to be incompatible with the current implementation of the
ToDevice
callback based on the following code:This essentially means that
learner.params
is set to the parameters of the full model at the start of each epoch. Thus, even if we try to freeze the layers manually withFlux.freeze!(learner.params.layers[1:end-1])
, this will be undone byToDevice
.Possible Implementation
One solution that would work with Flux's new explicit optimizers would be to create a callback to freeze layers after
ToDevice
is executed. An example is given below:However, perhaps we should consider whether it's necessary for
ToDevice
to move the model to the GPU at the start of every epoch. Maybe we could extend theCallback
interface to allow for some one-time setup code to run before the first epoch is executed?The text was updated successfully, but these errors were encountered: