Pretrained Model Reload + SparseGPT Support #31

Satrat · 2024-04-19T20:03:37Z

Adding in helper functions to support reloading a quantized model from config with SparseAutoModel. This has a few steps:

Load the model, apply any sparsity decompression, modify save_pretrained ( currently done in sparseML, but will move here soon!)
Apply quantization config to model if one exists in config.json. This does NOT initialize any of the scale and zero points, there are empty Parameters after apply_quantization_config() is called
Fill in the scale and zero points from the safetensors file with a new load_pretrained_quantization() function. This loops through the leaf modules and grabs the scale/zp from the safetensors file(s) at model_path

Example Usage (would be in SparseAutoModel.from_pretrained)

quantization_config = QuantizationConfig.from_model_config(pretrained_model_name_or_path)

model = super(AutoModelForCausalLM, cls).from_pretrained(pretrained_model_name_or_path)

# deal with sparsity compression, model modification here...

apply_quantization_config(model, quantization_config)
load_pretrained_quantization(model, pretrained_model_name_or_path)

@dbogunowicz I know a lot of the UX is going to change with your refactor, but I needed to get something up and running for testing. This is just adding in the helper functions that your UX could will eventually call

Associated SparseML branch: neuralmagic/sparseml#2246

Quick Note on SparseGPT/OBCQ

In the new fake_quantize implementation we overwrite the weights parameter in the forward call (forward.py)

self.weight.data = _maybe_calibrate_or_quantize(module, self.weight, "weight", scheme.weights)

This didn't happen in the old implementation, we never overwrote the actual parameter so the original unquantized weight was saved. This new implementation messes up OBCQ because we rely on the error between the unquantized and the quantized weight. As a workaround for now, I'm cloning the original weight then restoring it after the forward pass

dbogunowicz

This looks good to me. Let's chat today about the design of the new SparseAutoModelForCausalLM, I think it is high time we pieced all elements together.

Sara Adkins added 3 commits April 19, 2024 15:49

model reload working

2ab79f6

fix for sparseGPT

f7e9c01

docstrings

c011a77

Satrat mentioned this pull request Apr 19, 2024

Refactor Quantization Modifer and Reloading neuralmagic/sparseml#2246

Merged

Satrat requested review from dbogunowicz, bfineran, horheynm, dsikka and rahul-tuli and removed request for dbogunowicz and bfineran April 19, 2024 20:10

bfineran approved these changes Apr 22, 2024

View reviewed changes

dbogunowicz approved these changes Apr 23, 2024

View reviewed changes

Satrat merged commit 67005d7 into main Apr 23, 2024
2 checks passed

Satrat deleted the sa/model_reload branch April 23, 2024 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrained Model Reload + SparseGPT Support #31

Pretrained Model Reload + SparseGPT Support #31

Satrat commented Apr 19, 2024 •

edited by mgoin

Loading

dbogunowicz left a comment

Pretrained Model Reload + SparseGPT Support #31

Pretrained Model Reload + SparseGPT Support #31

Conversation

Satrat commented Apr 19, 2024 • edited by mgoin Loading

Example Usage (would be in SparseAutoModel.from_pretrained)

Quick Note on SparseGPT/OBCQ

dbogunowicz left a comment

Choose a reason for hiding this comment

Satrat commented Apr 19, 2024 •

edited by mgoin

Loading