You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attempt the classic Lottery Ticket method of training a large model, pruning it and than retraining the subnet from a previous state forward.
Try to train the subnet from a random initialization but with "idea" order of training (brute force it)
The current ideas around tickets are that the initialization of a given subnet is what is the defining feature of the ticket, the best way of training the subnet currently being IMP with Rewinding technique.
Perhaps by having optimal order of training it would minimize the effect of the initialization of the subnet.
Lets see if the difference in accuracy between random initialization of subnet + perfect order vs IMP with Rewinding technique is reduced.
The text was updated successfully, but these errors were encountered:
In the 2019 Frankle et al paper, the researchers presented a new technique called Iterative Magnitude Pruning (IMP) with Rewinding.
Instead of looking at the weights of neurons at initialization, rewinding to iteration zero, it instead looks at their weights after several training iterations aka rewinding to iteration k.
The rationale behind the technique is that in many architectures, some neurons are already at a “winning” weight at the beginning while others only reach a “winning” weight after some training.
The same rationale for using perfect ordering being that we increase a given ticket's change of reaching a "wining" state after some training
Attempt the classic Lottery Ticket method of training a large model, pruning it and than retraining the subnet from a previous state forward.
Try to train the subnet from a random initialization but with "idea" order of training (brute force it)
The current ideas around tickets are that the initialization of a given subnet is what is the defining feature of the ticket, the best way of training the subnet currently being IMP with Rewinding technique.
Perhaps by having optimal order of training it would minimize the effect of the initialization of the subnet.
Lets see if the difference in accuracy between random initialization of subnet + perfect order vs IMP with Rewinding technique is reduced.
The text was updated successfully, but these errors were encountered: