-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the rethinking-sparse-learning wiki!
Nov 23rd to Nov 30th:
-
Hyperparam tuning - [x] Alpha, Delta T - [x] Optuna, use 15 trials, 3 jobs in parallel - [x] Maximise
val_accuracy
- [x] Use single DB, different study names - [x] Plot should be of test- [x] Learning Rate - [x] Plot for each sparsity across 4 alpha, Delta T. - [x] $(\alpha, \Delta T) = (0.3, 100), (0.4,200), (0.4, 500), (0.5,750)$
-
CIFAR 10 Reporting
- Script W&B metrics
- Plot
- table
- Longer 2x runs
- RigL, RigL ERK
- SET Nov 28, 2020
- SNFS
- Static
-
FLOP Counting
- Adapt https://github.com/google-research/rigl/blob/master/rigl/imagenet_resnet/colabs/Resnet_50_Param_Flops_Counting.ipynb for our code + Wide Resnet
-
-
Dec 1st to 14th
-
CIFAR10
- Plots
- Hyperparam plots
-
Mini-Imagenet
- Dataloader
- Which runs?
- Dense
- Do we need linear warmup & fancy tricks?
-
Extensions
-
Distributions. Evaluate ERK vs Uniform on computation
-
Dynamic Structured Sparsity
-
Effect of accumulation gradient
-
Effect of redistribution
- Can ERK be a proxy? ie., avoid redistribution, use ERK instead.
- Need to show no gains for ERK by redistribution
- And some for random
Experiments:
\begin{itemize} \item RigL Random \item RigL Random with gradient re-distribution \item RigL Random with momentum re-distribution \item RigL Random with final static distribution found above \item RigL ERK \item RigL ERK with distribution \end{itemize}
\vscomment{Question: Is the effect of redistribution to find a better power-law distribution? Question: Is the found distribution even power-law?
-
Ablation CAM: how do sparse nets see?
-
-