Skip to content

a continual learning optimizer mitigating catastrophic forgetting and loss of plasticity

License

Notifications You must be signed in to change notification settings

yumikim381/upgd-dl-project

 
 

Repository files navigation

Enhancing utility-based Perturbed Gradient Descent with Adaptive Noise Injection

Deep Learning Project

This is the repository for the ETH Deep Learning project, based on the original paper "Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning" from here. We propose an improvement to the Utility-based Perturbed Gradient Descent (UPGD) method by injecting adaptive noise.

Details of our approach and results can be found in our Report (TODO: link final report)

Here we describe how to reproduce the results.

Installation

1. You need to have an environemnt with python 3.7

git clone  --recursive [email protected]:yumikim381/upgd-dl-project.git
python3.7 -m venv .upgd
source .upgd/bin/activate

2. Install Dependencies

python -m pip install --upgrade pip
pip install -r requirements.txt 
pip install HesScale/.
pip install .

Run our Results

Run Baseline

We have used Algorithm 1 of the original UPGD as our baseline. This code can be run as follows:

python3 core/run/run_stats.py \
  --task label_permuted_cifar10_stats \
  --learner baseline \
  --seed 19 \
  --lr 0.01 \
  --beta_utility 0.999 \
  --sigma 0.001 \
  --weight_decay 0.0 \
  --network convolutional_network_relu_with_hooks \
  --n_samples 1000000

Run Visualizations

Use the notebook notebooks/visualize_kernels.ipynb to run through the visualizations. All have individual cells, and should run with the same set of requirements as the rest of the code, with the potential exception of ipython and jupyterlab.

Run best Method with Adaptive Noise Injection

python3 core/run/run_stats.py \
  --task label_permuted_cifar10_stats \
  --learner ratio_norm \
  --seed 19 \
  --lr 0.01 \
  --beta_utility 0.999 \
  --sigma 0.001 \
  --weight_decay 0.0 \
  --network convolutional_network_relu_with_hooks \
  --n_samples 1000000

usgd can be replaced with the following options to run other variations we have tried out:

  1. Layer-wise Noise Scaling
    • weight_norm for scaling by the norm of weights
    • grad_norm for scaling by the norm of gradients
    • ratio_normfor scaling by the ratio of the gradient norm to the weight norm
  2. Kernel Utility
    • entire_kernel for entire kernel evaluation
    • column_kernel for column-wise kernel evaluation
    • KernelConvexCombi convex combination between neuron and kernel Evaluation

Get results for all Methods we have implemented

Use the notebook notebooks/get_results.ipynb to get the evaluations of the 2 experiments. The metrics will be printed out in a table format and graphical visualizations of the performance are provided.

About

a continual learning optimizer mitigating catastrophic forgetting and loss of plasticity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.5%
  • Python 4.3%
  • Shell 0.2%