Effective Interplay between Sparsity and Quantization: From Theory to Practice

This repository is the official implementation of the code used for all analysis and experiments in the paper: Effective Interplay between Sparsity and Quantization: From Theory to Practice.

The paper mathematically investigates the relationship between quantization and sparsity techniques, and how their errors combine when both techniques are used together. The theoretical analysis is validated by experimental results on a wide range of models.

About Our Work

Various forms of quantization and sparsity techniques have emerged as promising approaches to compress models, especially in the modern era of LLMs. This paper focuses on the combined application of both of these techniques, and is part of the broader research efforts to make the memory footprint of LLMs smaller, and make them more accessible. Our mathematical analysis and extensive empirical study with large language models (OPT, LLaMA) and vision transformers (ViT) demonstrate that quantization and sparsity are not orthogonal and their combined use can adversely affect model accuracy. Our findings provide valuable insights for optimizing the compression of large models while preserving accuracy.

To setup the environment, please run:

pip install -r requirements_pip.txt

Scripts to run LLaMA, OPT and ViT experiments are provided.

Access scripts for LLaMA and OPT in the following directory:

cd ./examples/pytorch/language-modeling/nips_configs/

Access scripts for ViT in the following directory:

cd ./examples/pytorch/image-classification/

Citation

If you find the analysis and experimental results useful for your own research, please cite our paper:

@article{quant-sparse-interplay:2024,
    title        = {{Effective Interplay between Sparsity and Quantization:
From Theory to Practice}},
    author       = {Harma, Simla Burcu and Chakraborty, Ayan and Kostenok, Elizaveta and Mishin, Danila and Ha, Dongho and Falsafi, Babak and Jaggi, Martin and Liu, Ming and Oh, Yunho and Subramanian, Suvinay and Yazdanbakhsh, Amir},
    year         = 2024,
    journal      = {arXiv preprint}
}

Name	Name	Last commit message	Last commit date
Latest commit KostenokLisa Updated copyright notices Nov 8, 2024 311607a · Nov 8, 2024 History 4 Commits
.circleci	.circleci	Transferred submission code	Nov 6, 2024
docker	docker	Transferred submission code	Nov 6, 2024
docs	docs	Transferred submission code	Nov 6, 2024
examples	examples	Transferred submission code	Nov 6, 2024
model_cards	model_cards	Transferred submission code	Nov 6, 2024
notebooks	notebooks	Transferred submission code	Nov 6, 2024
scripts	scripts	Transferred submission code	Nov 6, 2024
src/transformers	src/transformers	Updated copyright notices	Nov 8, 2024
templates	templates	Transferred submission code	Nov 6, 2024
tests	tests	Transferred submission code	Nov 6, 2024
utils	utils	Transferred submission code	Nov 6, 2024
.coveragerc	.coveragerc	Transferred submission code	Nov 6, 2024
.gitattributes	.gitattributes	Transferred submission code	Nov 6, 2024
.gitignore	.gitignore	Transferred submission code	Nov 6, 2024
CITATION.cff	CITATION.cff	Transferred submission code	Nov 6, 2024
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Transferred submission code	Nov 6, 2024
CONTRIBUTING.md	CONTRIBUTING.md	Transferred submission code	Nov 6, 2024
ISSUES.md	ISSUES.md	Transferred submission code	Nov 6, 2024
LICENSE	LICENSE	Transferred submission code	Nov 6, 2024
LICENSE_HBFP.txt	LICENSE_HBFP.txt	Updated copyright notices	Nov 8, 2024
MANIFEST.in	MANIFEST.in	Transferred submission code	Nov 6, 2024
Makefile	Makefile	Transferred submission code	Nov 6, 2024
README.md	README.md	Transferred submission code	Nov 6, 2024
README_es.md	README_es.md	Transferred submission code	Nov 6, 2024
README_hd.md	README_hd.md	Transferred submission code	Nov 6, 2024
README_hf.md	README_hf.md	Transferred submission code	Nov 6, 2024
README_ja.md	README_ja.md	Transferred submission code	Nov 6, 2024
README_ko.md	README_ko.md	Transferred submission code	Nov 6, 2024
README_zh-hans.md	README_zh-hans.md	Transferred submission code	Nov 6, 2024
README_zh-hant.md	README_zh-hant.md	Transferred submission code	Nov 6, 2024
conftest.py	conftest.py	Transferred submission code	Nov 6, 2024
hubconf.py	hubconf.py	Transferred submission code	Nov 6, 2024
pyproject.toml	pyproject.toml	Transferred submission code	Nov 6, 2024
requirements_conda.txt	requirements_conda.txt	Transferred submission code	Nov 6, 2024
requirements_pip.txt	requirements_pip.txt	Transferred submission code	Nov 6, 2024
run.sh	run.sh	Transferred submission code	Nov 6, 2024
setup.cfg	setup.cfg	Transferred submission code	Nov 6, 2024
setup.py	setup.py	Transferred submission code	Nov 6, 2024
setup_runai.sh	setup_runai.sh	Transferred submission code	Nov 6, 2024
test	test	Transferred submission code	Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Effective Interplay between Sparsity and Quantization: From Theory to Practice

About Our Work

Citation

About

Releases

Packages

Contributors 3

Languages

License

parsa-epfl/quantization-sparsity-interplay

Folders and files

Latest commit

History

Repository files navigation

Effective Interplay between Sparsity and Quantization: From Theory to Practice

About Our Work

Citation

About

Resources

License

Code of conduct

Citation

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages