Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for targets and ignore in Sparsity Compressors #182

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Oct 6, 2024

This PR introduces support for using targets and ignore in sparsity compressors. It has been tested against the llm-compressor repository at commit a47137d8 (on main).

Changes Made

  • Cleaned up several utilities and added corresponding tests.
  • Updated the BaseSparsity.compress(...) methods to accept a new compression_targets argument.
  • Enhanced the ModelCompressor to directly populate the compression_targets argument.

Verification

The functionality was verified using the following script:

Verification Script
"""
Usage: python verification.py
Tested against llm-compressor commit a47137d8
"""


from transformers import AutoTokenizer, AutoModelForCausalLM
from llmcompressor.transformers.compression.sparsity_config import SparsityConfigMetadata
from llmcompressor.transformers import oneshot
from safetensors import safe_open

MODEL_ID = "nm-testing/llama2.c-stories42M-pruned2.4"

def check_first_layer(save_dir, check_compressed=True):
    with safe_open(f"{save_dir}/model.safetensors", framework="pt", device=0) as f:
        layer_0_keys = [key for key in f.keys() if "model.layers.0" in key]
        if check_compressed:
            assert any("compressed" in key for key in layer_0_keys), "First layer is not compressed as expected."
        else:
            assert not any("compressed" in key for key in layer_0_keys), "First layer is compressed unexpectedly."

def main():
    model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", torch_dtype="auto")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

    # Apply oneshot to wrap save_pretrained 
    oneshot(model=model)

    # Compress and save the model
    sparsity_config = SparsityConfigMetadata.from_pretrained(model, compress=True)
    save_dir_compressed = f"{MODEL_ID.split('/')[1]}-2of4-compressed"
    model.save_pretrained(save_dir_compressed, sparsity_config=sparsity_config)
    tokenizer.save_pretrained(save_dir_compressed)

    # Verify the first layer is compressed
    check_first_layer(save_dir_compressed, check_compressed=True)

    # Ignore the first layer and save the model again
    sparsity_config.ignore.append("re:model.layers.0.*")
    save_dir_ignored = f"{MODEL_ID.split('/')[1]}-2of4-ignored-first-layer"
    model.save_pretrained(save_dir_ignored, sparsity_config=sparsity_config)
    tokenizer.save_pretrained(save_dir_ignored)

    # Verify the first layer is not compressed
    check_first_layer(save_dir_ignored, check_compressed=False)

if __name__ == "__main__":
    main()

The script passes successfully without any assertions.

Script Output
2024-11-27T10:18:45.295223+0000 | one_shot | INFO - *** One Shot ***
2024-11-27T10:18:45.295382+0000 | initialize | INFO - Compression lifecycle initialized for 0 modifiers
2024-11-27T10:18:45.295428+0000 | finalize | INFO - Compression lifecycle finalized for 0 modifiers
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1068.21it/s]
Checking whether model follows 2:4 sparsity structure: 100%|████████████████████████████████████████████████████████████████| 57/57 [00:00<00:00, 1572.75it/s]
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1562.54it/s]
Compressing model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1205.22it/s]
2024-11-27T10:18:46.694477+0000 | get_serialized_recipe | WARNING - Recipe not found in session - it may have been reset
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1892.06it/s]
Compressing model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1701.64it/s]
2024-11-27T10:18:47.582677+0000 | get_serialized_recipe | WARNING - Recipe not found in session - it may have been reset

@rahul-tuli rahul-tuli marked this pull request as ready for review October 7, 2024 13:59
Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how/if this is related to #822 (it's listed as a dependency)

  1. Doesn't this list of targets need to be accounted for during decompression?
  2. Don't these changes throw away any weights which are not targeted for sparse compression?

@markurtz markurtz self-requested a review October 14, 2024 13:35
@rahul-tuli rahul-tuli force-pushed the add-targets-and-ignore-support branch from 400c6c3 to e5bfd8a Compare October 23, 2024 14:50
@rahul-tuli
Copy link
Member Author

rahul-tuli commented Oct 23, 2024

I'm not sure how/if this is related to #822 (it's listed as a dependency)

  1. Doesn't this list of targets need to be accounted for during decompression?
  2. Don't these changes throw away any weights which are not targeted for sparse compression?

Point 1: Decompression takes care of that using COMPRESSION_PARAM_NAMES
Point 2: Fixed

It is listed as a dependency for #822 because without this we cannot enable sparse compression + quantization compression. These changes are needed for #822 to work fine.

This was referenced Nov 27, 2024
kylesayrs
kylesayrs previously approved these changes Nov 27, 2024
src/compressed_tensors/utils/safetensors_load.py Outdated Show resolved Hide resolved
src/compressed_tensors/utils/safetensors_load.py Outdated Show resolved Hide resolved
tests/test_quantization/lifecycle/test_apply.py Outdated Show resolved Hide resolved
tests/test_quantization/lifecycle/test_apply.py Outdated Show resolved Hide resolved
kylesayrs
kylesayrs previously approved these changes Nov 27, 2024
Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -97,8 +107,10 @@ def decompress(
:param device: device to load decompressed weights onto
:return: iterator for generating decompressed weights
"""
weight_mappings = get_nested_weight_mappings(
path_to_model_or_tensors, self.COMPRESSION_PARAM_NAMES
weight_mappings, other_params = get_nested_weight_mappings(
Copy link
Contributor

@dsikka dsikka Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use a more descriptive variable name for other_params

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

src/compressed_tensors/utils/safetensors_load.py Outdated Show resolved Hide resolved
model_path: str, params_to_nest: List[str]
) -> Dict[str, Dict[str, str]]:
model_path: str, params_to_nest: List[str], return_other_params: bool = False
) -> Union[NestedWeightMappingType, Tuple[NestedWeightMappingType, WeightMappingType]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update this docstring to indicate what the output is expected to look like when return_other_params is True?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

nested_weight_mappings[dense_param][param_name] = weight_mappings[key]
nested_weight_mappings[dense_param][param_name] = file_location
matched = True
if not matched:
Copy link
Contributor

@dsikka dsikka Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases would we not have a match? Why do we need the other_params dictionary if the output from decompress is used to replace weights?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decompress has to return uncompressed_params from the safetensors file as well, because those also must be populated in the original potentially empty model. think of cases where we also have quantization in the mix, things like weight_scale must also be populated

We do not have a match for all the uncompressed_params, again things like scales

src/compressed_tensors/quantization/lifecycle/apply.py Outdated Show resolved Hide resolved
return name.endswith(".weight")

return (
name.endswith(".weight") and name[: -(len(".weight"))] in expanded_targets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(".weight") can be hardcoded to the actual length. Should also add that it comes from len(".weight")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants