-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage] How to manually set calibration_function? #886
Comments
The mappings error is an old bug that was fixed in #37, but strangely the fix has not been merged into the mainline until now. |
Thanks for the report. We need to do a better job on updating the documentation and examples for SQ mappings. |
Note that this PR fixes the yaml parsing issue |
@donpromax What is your use case for overloading the calibration function? |
## Purpose ## * Fix regex targets not being ignored * Fix pydantic type checking to allow lists to be used instead of tuples * Add ChatGLM mappings (which are the same as the bloom mappings) ## Issues ## * Fixes #105 * Fixes (partially) #886 * Related to #1003 ## Testing ## <details><summary>glm.py</summary> ```python3 import requests from PIL import Image from io import BytesIO from transformers import AutoProcessor from llmcompressor.transformers import oneshot from llmcompressor.modifiers.quantization import GPTQModifier from llmcompressor.modifiers.smoothquant import SmoothQuantModifier from llmcompressor.modifiers.smoothquant.utils import BLOOM_SMOOTHQUANT_MAPPINGS from datasets import load_dataset from llmcompressor.transformers.tracing import ChatGLMForConditionalGeneration from llmcompressor.transformers.utils.data_collator import glm_data_collator MODEL_ID = "THUDM/glm-4v-9b" model = ChatGLMForConditionalGeneration.from_pretrained( MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True ) processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True) NUM_CALIBRATION_SAMPLES = 1 #512 MAX_SEQUENCE_LENGTH = 2048 ds = load_dataset("Lin-Chen/ShareGPT4V", "ShareGPT4V", split=f"train[:{NUM_CALIBRATION_SAMPLES}]") ds = ds.shuffle(seed=42) def preprocess(example): url_part = "/".join(example["image"].split("/")[1:]) url = f"http://images.cocodataset.org/{url_part}" response = requests.get(url) response.raise_for_status() image = Image.open(BytesIO(response.content)).convert('RGB') return processor.apply_chat_template( [ { "role": "user", "image": image, "content": example["conversations"][0]["value"], } ], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True, ) ds = ds.map(preprocess, remove_columns=ds.column_names) # Configure the quantization algorithms recipe = [ SmoothQuantModifier( smoothing_strength=0.8, mappings=[ [["re:.*query_key_value"], "re:.*input_layernorm"], [["re:.*dense_h_to_4h"], "re:.*post_attention_layernorm"], ], ignore=["transformer.output_layer", "re:transformer.vision.*"] ), #GPTQModifier( # targets="Linear", # scheme="W8A8", # sequential_targets=["GLMBlock"], # ignore=["transformer.output_layer", "re:transformer.vision.*"], #), ] # Apply quantization oneshot( model=model, dataset=ds, recipe=recipe, max_seq_length=MAX_SEQUENCE_LENGTH, num_calibration_samples=NUM_CALIBRATION_SAMPLES, trust_remote_code_model=True, data_collator=glm_data_collator, ) ``` </details> Signed-off-by: Kyle Sayers <[email protected]>
## Purpose ## * Fix regex targets not being ignored * Fix pydantic type checking to allow lists to be used instead of tuples * Add ChatGLM mappings (which are the same as the bloom mappings) ## Issues ## * Fixes #105 * Fixes (partially) #886 * Related to #1003 ## Testing ## <details><summary>glm.py</summary> ```python3 import requests from PIL import Image from io import BytesIO from transformers import AutoProcessor from llmcompressor.transformers import oneshot from llmcompressor.modifiers.quantization import GPTQModifier from llmcompressor.modifiers.smoothquant import SmoothQuantModifier from llmcompressor.modifiers.smoothquant.utils import BLOOM_SMOOTHQUANT_MAPPINGS from datasets import load_dataset from llmcompressor.transformers.tracing import ChatGLMForConditionalGeneration from llmcompressor.transformers.utils.data_collator import glm_data_collator MODEL_ID = "THUDM/glm-4v-9b" model = ChatGLMForConditionalGeneration.from_pretrained( MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True ) processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True) NUM_CALIBRATION_SAMPLES = 1 #512 MAX_SEQUENCE_LENGTH = 2048 ds = load_dataset("Lin-Chen/ShareGPT4V", "ShareGPT4V", split=f"train[:{NUM_CALIBRATION_SAMPLES}]") ds = ds.shuffle(seed=42) def preprocess(example): url_part = "/".join(example["image"].split("/")[1:]) url = f"http://images.cocodataset.org/{url_part}" response = requests.get(url) response.raise_for_status() image = Image.open(BytesIO(response.content)).convert('RGB') return processor.apply_chat_template( [ { "role": "user", "image": image, "content": example["conversations"][0]["value"], } ], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True, ) ds = ds.map(preprocess, remove_columns=ds.column_names) # Configure the quantization algorithms recipe = [ SmoothQuantModifier( smoothing_strength=0.8, mappings=[ [["re:.*query_key_value"], "re:.*input_layernorm"], [["re:.*dense_h_to_4h"], "re:.*post_attention_layernorm"], ], ignore=["transformer.output_layer", "re:transformer.vision.*"] ), #GPTQModifier( # targets="Linear", # scheme="W8A8", # sequential_targets=["GLMBlock"], # ignore=["transformer.output_layer", "re:transformer.vision.*"], #), ] # Apply quantization oneshot( model=model, dataset=ds, recipe=recipe, max_seq_length=MAX_SEQUENCE_LENGTH, num_calibration_samples=NUM_CALIBRATION_SAMPLES, trust_remote_code_model=True, data_collator=glm_data_collator, ) ``` </details> Signed-off-by: Kyle Sayers <[email protected]>
Closing this for now, feel free to open if there's a use case that LLM Compressor should support! |
I noticed that modifiers like
SmoothQuantModifier
have a parametercalibration_function
for the forward pass during calibration. However, it's not clear how to set thiscalibration_function
. For example:Additionally, I found that initializing
SmoothQuantModifier
without using YAML for themappings
parameter results in an error #105This results in the following error:
The text was updated successfully, but these errors were encountered: