[Usage] How to manually set calibration_function? #886

donpromax · 2024-11-01T08:35:03Z

I noticed that modifiers like SmoothQuantModifier have a parameter calibration_function for the forward pass during calibration. However, it's not clear how to set this calibration_function. For example:

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8, calibration_function=model.generate),
    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"]),
]

Additionally, I found that initializing SmoothQuantModifier without using YAML for the mappings parameter results in an error #105

mappings = [
    [["re:.*wqkv"], "re:.*attention_norm"],
    [["re:.*w1", "re:.*w3"], "re:.*ffn_norm"]
]

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8, mappings=mappings),
    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["re:.*output", "re:vision_model.*", "re:mlp1.*"]),
]

This results in the following error:

  File "/llm-compressor/src/llmcompressor/recipe/recipe.py", line 601, in _load_json_or_yaml_string
    raise ValueError(f"Could not parse recipe from string {content}") from err
ValueError: Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.8
      mappings:
      - !!python/tuple
        - - re:.*wqkv
        - re:.*attention_norm
      - !!python/tuple
        - - re:.*w1
          - re:.*w3
        - re:.*ffn_norm
    GPTQModifier:
      targets: Linear
      ignore:
      - lm_head
      scheme: W8A8

The text was updated successfully, but these errors were encountered:

HelloCard · 2024-11-13T12:37:44Z

The mappings error is an old bug that was fixed in #37, but strangely the fix has not been merged into the mainline until now.
As stated there, the workaround is to use string mappings, which seems to have no additional pitfalls.

robertgshaw2-redhat · 2024-12-06T19:02:49Z

Thanks for the report. We need to do a better job on updating the documentation and examples for SQ mappings.

kylesayrs · 2025-01-10T07:05:55Z

Note that this PR fixes the yaml parsing issue

kylesayrs · 2025-01-10T07:07:57Z

@donpromax What is your use case for overloading the calibration function?

## Purpose ## * Fix regex targets not being ignored * Fix pydantic type checking to allow lists to be used instead of tuples * Add ChatGLM mappings (which are the same as the bloom mappings) ## Issues ## * Fixes #105 * Fixes (partially) #886 * Related to #1003 ## Testing ## <details><summary>glm.py</summary> ```python3 import requests from PIL import Image from io import BytesIO from transformers import AutoProcessor from llmcompressor.transformers import oneshot from llmcompressor.modifiers.quantization import GPTQModifier from llmcompressor.modifiers.smoothquant import SmoothQuantModifier from llmcompressor.modifiers.smoothquant.utils import BLOOM_SMOOTHQUANT_MAPPINGS from datasets import load_dataset from llmcompressor.transformers.tracing import ChatGLMForConditionalGeneration from llmcompressor.transformers.utils.data_collator import glm_data_collator MODEL_ID = "THUDM/glm-4v-9b" model = ChatGLMForConditionalGeneration.from_pretrained( MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True ) processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True) NUM_CALIBRATION_SAMPLES = 1 #512 MAX_SEQUENCE_LENGTH = 2048 ds = load_dataset("Lin-Chen/ShareGPT4V", "ShareGPT4V", split=f"train[:{NUM_CALIBRATION_SAMPLES}]") ds = ds.shuffle(seed=42) def preprocess(example): url_part = "/".join(example["image"].split("/")[1:]) url = f"http://images.cocodataset.org/{url_part}" response = requests.get(url) response.raise_for_status() image = Image.open(BytesIO(response.content)).convert('RGB') return processor.apply_chat_template( [ { "role": "user", "image": image, "content": example["conversations"][0]["value"], } ], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True, ) ds = ds.map(preprocess, remove_columns=ds.column_names) # Configure the quantization algorithms recipe = [ SmoothQuantModifier( smoothing_strength=0.8, mappings=[ [["re:.*query_key_value"], "re:.*input_layernorm"], [["re:.*dense_h_to_4h"], "re:.*post_attention_layernorm"], ], ignore=["transformer.output_layer", "re:transformer.vision.*"] ), #GPTQModifier( # targets="Linear", # scheme="W8A8", # sequential_targets=["GLMBlock"], # ignore=["transformer.output_layer", "re:transformer.vision.*"], #), ] # Apply quantization oneshot( model=model, dataset=ds, recipe=recipe, max_seq_length=MAX_SEQUENCE_LENGTH, num_calibration_samples=NUM_CALIBRATION_SAMPLES, trust_remote_code_model=True, data_collator=glm_data_collator, ) ``` </details> Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs · 2025-01-19T01:40:22Z

Closing this for now, feel free to open if there's a use case that LLM Compressor should support!

dsikka assigned rahul-tuli Nov 4, 2024

rahul-tuli mentioned this issue Nov 5, 2024

Refactor Recipe creation flow: Directly convert Modifier Instances to Recipe #48

Open

kylesayrs mentioned this issue Jan 10, 2025

Fix smoothquant ignore, Fix typing, Add glm mappings #1015

Merged

kylesayrs self-assigned this Jan 10, 2025

kylesayrs closed this as completed Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage] How to manually set calibration_function? #886

[Usage] How to manually set calibration_function? #886

donpromax commented Nov 1, 2024 •

edited

Loading

HelloCard commented Nov 13, 2024

robertgshaw2-redhat commented Dec 6, 2024

kylesayrs commented Jan 10, 2025

kylesayrs commented Jan 10, 2025

kylesayrs commented Jan 19, 2025

[Usage] How to manually set calibration_function? #886

[Usage] How to manually set calibration_function? #886

Comments

donpromax commented Nov 1, 2024 • edited Loading

HelloCard commented Nov 13, 2024

robertgshaw2-redhat commented Dec 6, 2024

kylesayrs commented Jan 10, 2025

kylesayrs commented Jan 10, 2025

kylesayrs commented Jan 19, 2025

donpromax commented Nov 1, 2024 •

edited

Loading