GitHub - federicoparra/WeightWatcher: The WeightWatcher tool for predicting the accuracy of Deep Neural Networks

Weight Watcher

Current Version / Release: 0.5.1

WeightWatcher (WW): is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data. It can be used to:

analyze pre/trained pyTorch, Keras, DNN models (Conv2D and Dense layers)
monitor models, and the model layers, to see if they are over-trained or over-parameterized
predict test accuracies across different models, with or without training data
detect potential problems when compressing or fine-tuning pretrained models
layer warning labels: over-trained; under-trained

ad well several new experimental model transformations, including:

SVDSmoothing: builds a model that can be used to predict test accuracies, but only with the training data.
SVDSharpness: removes Correlation Traps, which arise from sub-optimal regularization pre-trained models.

Experimental / Most Recent version 0.5.x

You may install the latest / Trunk from testpypi

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple weightwatcher

The testpypi version usually has the most recent updates, including experimental methods qnd bug fixes

From Research to Production

WeightWatcher is based on theoretical research (done injoint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.

More details and demos can be found on the Calculated Content Blog

Installation

pip install  weightwatcher

Usage

import weightwatcher as ww
import torchvision.models as models

model = models.vgg19_bn(pretrained=True)
watcher = ww.WeightWatcher(model=model)
details = watcher.analyze()
summary = watcher.get_summary(details)

It is as easy to run and generates a pandas dataframe with details (and plots) for each layer

and summary dict of generalization metrics

    {'log_norm': 2.11,
      'alpha': 3.06,
      'alpha_weighted': 2.78,
      'log_alpha_norm': 3.21,
      'log_spectral_norm': 0.89,
      'stable_rank': 20.90,
      'mp_softrank': 0.52}]

Layer Details:

WW computes several Scale and Shape metrics for each layer Weight matrix W, as described in our papers (see below)

These are reported in a details dataframe, including:

Scale Metrics

log Frobenius norm: $\log_{10}\Vert\mathbf{W}\Vert^{2}_{F}$
log Spectral norm: $\log_{10}\lambda_{max}=\log_{10}\Vert\mathbf{W}\Vert^{2}_{\infty}$
Stable Rank: $R_{stable}=\Vert\mathbf{W}\Vert^{2}_{F}/\Vert\mathbf{W}\Vert^{2}_{\infty}$
MP Soft Rank: $R_{MP}=\lambda_{MP}/\lambda_{max}$

Shape Metrics

PL exponent alpha: $\alpha$

Scale-adjusted Shape Metrics

weighted alpha: $\hat{\alpha}=\alpha\log_{10}\lambda_{max}$
log alpha norm (Shatten norm): $\log_{10}\Vert\mathbf{X}\Vert^{\alpha}_{\alpha}$

Misc Details

N, M: Matrix or Tensor Slice Dimensions
D: Quality of the (Truncated) Power law fit (D is the Kolmogorov Smirnov Distance metric)
num_spikes: number of spikes outside the bulk region of the ESD, when fit to an MP distribution

Summary Statistics:

The layer metrics are be averaged in the summary statistics:

Get the average metrics, as a summary (dict), from the given (or current) details dataframe

details = watcher.analyze(model=model)
summary = watcher.get_summary(model)

or just

summary = watcher.get_summary()

The summary statistics can be used to gauge the test error of a series of pre/trained models, without needing access to training or test data.

average alpha can be used to compare one or more DNN models with different hyperparemeter settings θ, but of the same depth.
average log spectral norm is useful to compare models of different depths L
average weighted alpha and log alpha norm are suitable for DNNs of differing hyperparemeters θ and depths L simultaneously.

Advanced Usage

The watcher object has several functions and analyze features described below

analyze( model=None, layers=[], min_evals=0, max_evals=None,
	 plot=True, randomize=True, mp_fit=True, ww2x=False, savefig=True):
...
describe(self, model=None, layers=[], min_evals=0, max_evals=None,
         plot=True, randomize=True, mp_fit=True, ww2x=False):
...
get_details()
get_summary(details) or get_summary()
get_ESD()
...
distances(model_1, model_2)

Ploting and Fitting the Empirical Spectral Density (ESD)

WW creates plots for each layer weight matrix to observe how well the power law fits work

details = watcher.analyze(plot=True)

For each layer, Weightwatcher plots the ESD--a histogram of the eigenvalues of the layer correlation matrix X=W^TW. It then fits the tail of ESD to a (Truncated) Power Law, and plots these fits on different axes. The metrics (above) characterize the Shape and Scale of each ESD.

Detecting OverTraining

Weightwatcher can detect the signatures of overtraining in specific layers of a pre/trained Deep Neural Networks.

Early stopping

The weightwatcher alpha metric can be used to detect when to apply early stopping. When the average alpha (summary statistic) drops below 2.0, this indicates that the model may be overtrained and early stopping is necesary.

Below is an example of this, showing training loss and test lost curves for a small Transformer model, trained from scratch, along with the average alpha summary statistic.

We can see that as the training and test losses decrease, so does alpha. But when the test loss saturates and then starts to increase, alpha drops below 2.0.

Correlation Traps

The randomize option compares the ESD of the layer weight matrix (W) to the ESD of the randomized W matrix. This is good way to visualize the correlations in the true ESD.

details = watcher.analyze(randomize=True, plot=True)

Fig (a) is well trained; Fig (b) may be over-trained. That orange spike on the far right is the tell-tale clue; it's caled a Correlation Trap.

A Correlation Trap is characterized by Fig (b); here the actual (green) and random (red) ESDs look almost identical, except for a small shelf of correlation (just right of 0). And for the random (red) ESD, the largest eigenvalue (orange) is far to the right of and seperated from the bulk of the ESD.

Weightwatcher will analyze your model, layer-by-layer, and show you where these kind of problems may be lurking.

Predicting the Generalization Error

WeightWatcher (WW)can be used to compare the test error for a series of models, trained on the similar dataset, but with different hyperparameters, or even different but related architectures.

Our Theory of HT-SR predicts that models with smaller PL exponents alpha , on average, correspond to models that generalize better.

The WW summary metric alpha (α) can predict the generalization Δ error when varying the model hyperparmeters θ (like batch size, learning rate, momentum, etc)

PL exponent alpha: $\langle\alpha\rangle\sim\Delta(\theta)$

whereas the summary metric weighed alpha can predict the generalization error Δ when varying hyperparmeters θ and depth L

weighted alpha: $\hat{\alpha}\sim\Delta(\theta,\L)$

Here is an example of the Weighted Alpha capacity metric for all the current pretrained VGG models.

This can be reppduced with the Demo Notebook

Notice: we did not peek at the ImageNet test data to build this plot.

SVDSmoothing and SVDSharpness Transforms

As descibed in our latest paper

Smoothed models can be used to predict test accuracies, by evaluating the training accuracy on the smoothed model.

smoothed_model = watcher.SVDSmoothing(model=...)

Sharpned models can be used when fine-tuning pre-trained models that have not been fully optimized yet.

sharpemed_model = watcher.SVDSharpness(model=...)

Sample notebooks are provided for each new feature

Additional Features

filter by layer types

ww.LAYER_TYPE.CONV2D |  ww.LAYER_TYPE.CONV2D |  ww.LAYER_TYPE.DENSE

as

details=watcher.analyze(layers=[ww.LAYER_TYPE.CONV2D])

filter by ids or name

details=watcher.analyze(layers=[20])

minimum, maximum number of eigenvalues of the layer weight matrix

Sets the minimum and maximum size of the weight matrices analyzed. Setting max is useful for a quick debugging.

details = watcher.analyze(min_evals=50, max_evals=500)

fit ESDs to a Marchenko-Pastur (MP) distrbution

The mp_fit option tells WW to fit each layer ESD as a Random Matrix as a Marchenko-Pastur (MP) distribution, as described in our papers on HT-SR.

details = watcher.analyze(mp_fit=True, plot=True)

and reports the

num_spikes, mp_sigma, and mp_sofrank

Also works for randomized ESD and reports

rand_num_spikes, rand_mp_sigma, and rand_mp_sofrank

get the ESD for a specific layer, for visualization or further analysis

watcher.analyze()
esd = watcher.get_ESD()

describe a model

Describe a model and report the details dataframe, without analyzing it

details = watcher.describe(model=model)

compare 2 models

The new distances method reports the distances between 2 models, such as the norm between the initial weight matrices and the final, trained weight matrices

details = watcher.distances(initial_model, trained_model)

compatability with version 0.2x

The new 0.4 version of weightwatcher treats each layer as a single, unified set of eigenvalues. In contrast, the 0.2x versions split the Conv2D layers into n slices, 1 for each receptive field. The ww2x option provides results which are back-compatable with the 0.2x version of weightwatcher, with details provide for each slice for each layer.

details = watcher.analyze(ww2x=True)

Save figures

Saves the layer ESD plots for each layer

watcher.analyze(savefig=True)

generating 4 files per layer

ww.layer#.esd1.png
ww.layer#.esd2.png
ww.layer#.esd3.png
ww.layer#.esd4.png

Frameworks supported

Tensorflow 2.x / Keras
PyTorch
HuggingFace

Layers supported

Dense / Linear / Fully Connected (and Conv1D)
Conv2D

Known issues

rankloss is currently not working , may be always set to 0
the embedded powerlaw packages may show warning messages; you can ignore these

   /home/xander/anaconda3/envs/my_model/lib/python3.7/site-packages/powerlaw.py:700: RuntimeWarning: divide by zero encountered in true_divide
  (Theoretical_CDF * (1 - Theoretical_CDF))

Demo Notebooks

Basic Usage

Analyzing the VGG series

Using the ww2x option

How to Release

Publishing to the PyPI repository:

# 1. Check in the latest code with the correct revision number (__version__ in __init__.py)
vi weightwatcher/__init__.py # Increse release number, remove -dev to revision number
git commit
# 2. Check out latest version from the repo in a fresh directory
cd ~/temp/
git clone https://github.com/CalculatedContent/WeightWatcher
cd WeightWatcher/
# 3. Use the latest version of the tools
python -m pip install --upgrade setuptools wheel twine
# 4. Create the package
python setup.py sdist bdist_wheel
# 5. Test the package
twine check dist/*
# 6. Upload the package to PyPI
twine upload dist/*
# 7. Tag/Release in github by creating a new release (https://github.com/CalculatedContent/WeightWatcher/releases/new)

License

Apache License 2.0

Academic Presentations and Media Appearances

This tool is based on state-of-the-art research done in collaboration with UC Berkeley:

Latest papers and talks

and has been presented at Stanford, UC Berkeley, etc:

KDD2019 Workshop

KDD 2019 Workshop: Statistical Mechanics Methods for Discovering Knowledge from Production-Scale Neural Networks

KDD 2019 Workshop: Slides

Popular Popdcasts and Blogs

and has been the subject many popular podcasts

2021 Short Presentations

MLC Research Jam March 2021

PyTorch2021 Poster April 2021

Slack Channel

We have a slack channel for the tool if you need help For an invite, please send an email to [email protected]

Contributors

Charles H Martin, PhD Calculation Consulting

Serena Peng

Consulting Practice

Calculation Consulting homepage

Calculated Content Blog

Name		Name	Last commit message	Last commit date
Latest commit History 451 Commits
presentations		presentations
tests		tests
weightwatcher		weightwatcher
.gitignore		.gitignore
CV-models.png		CV-models.png
ESD-plots.png		ESD-plots.png
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
ModelPlots.ipynb		ModelPlots.ipynb
README.md		README.md
TestBulkEdge.ipynb		TestBulkEdge.ipynb
VGG16.1.png		VGG16.1.png
VGG16.2.png		VGG16.2.png
WWLayerIterator.ipynb		WWLayerIterator.ipynb
WWTemp.ipynb		WWTemp.ipynb
WeightWatcher-DenseNet.ipynb		WeightWatcher-DenseNet.ipynb
WeightWatcher-Full-PyTorchCV.ipynb		WeightWatcher-Full-PyTorchCV.ipynb
WeightWatcher-GPT.ipynb		WeightWatcher-GPT.ipynb
WeightWatcher-ONNX.ipynb		WeightWatcher-ONNX.ipynb
WeightWatcher-ResNet.ipynb		WeightWatcher-ResNet.ipynb
WeightWatcher-SVDSharpness-VGG11.ipynb		WeightWatcher-SVDSharpness-VGG11.ipynb
WeightWatcher-SVDSmoothing-ONNX-DEBUG.ipynb		WeightWatcher-SVDSmoothing-ONNX-DEBUG.ipynb
WeightWatcher-SVDSmoothing-VGG16-Keras.ipynb		WeightWatcher-SVDSmoothing-VGG16-Keras.ipynb
WeightWatcher-SVDSmoothing-VGG16.ipynb		WeightWatcher-SVDSmoothing-VGG16.ipynb
WeightWatcher-SVDSmoothing.ipynb		WeightWatcher-SVDSmoothing.ipynb
WeightWatcher-Test.ipynb		WeightWatcher-Test.ipynb
WeightWatcher-VGG-Random.ipynb		WeightWatcher-VGG-Random.ipynb
WeightWatcher-VGG-ww2x.ipynb		WeightWatcher-VGG-ww2x.ipynb
WeightWatcher-VGG.ipynb		WeightWatcher-VGG.ipynb
WeightWatcher.ipynb		WeightWatcher.ipynb
correlation_trap.jpeg		correlation_trap.jpeg
densenet.png		densenet.png
early_stopping.png		early_stopping.png
environment.yml		environment.yml
pytorchcv.html		pytorchcv.html
requirements.txt		requirements.txt
requirements.txt~		requirements.txt~
resnet18.png		resnet18.png
sample-ww-details.png		sample-ww-details.png
setup.cfg		setup.cfg
setup.py		setup.py
trainWithUnifiedSVDsmoothing.ipynb		trainWithUnifiedSVDsmoothing.ipynb
unifiedSVDsmoothing.ipynb		unifiedSVDsmoothing.ipynb

License

federicoparra/WeightWatcher

Folders and files

Latest commit

History

Repository files navigation

Weight Watcher

Current Version / Release: 0.5.1

Experimental / Most Recent version 0.5.x

From Research to Production

Installation

Usage

Layer Details:

Scale Metrics

Shape Metrics

Scale-adjusted Shape Metrics

Misc Details

Summary Statistics:

Advanced Usage

Ploting and Fitting the Empirical Spectral Density (ESD)

Detecting OverTraining

Early stopping

Correlation Traps

Predicting the Generalization Error

SVDSmoothing and SVDSharpness Transforms

As descibed in our latest paper

Additional Features

filter by layer types

filter by ids or name

minimum, maximum number of eigenvalues of the layer weight matrix

fit ESDs to a Marchenko-Pastur (MP) distrbution

get the ESD for a specific layer, for visualization or further analysis

describe a model

compare 2 models

compatability with version 0.2x

Save figures

Frameworks supported

Layers supported

Known issues

Demo Notebooks

How to Release

License

Academic Presentations and Media Appearances

Latest papers and talks

KDD2019 Workshop

Popular Popdcasts and Blogs

2021 Short Presentations

Slack Channel

Contributors

Consulting Practice

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages