Fix smoothquant ignore, Fix typing, Add glm mappings #1015

kylesayrs · 2024-12-27T16:53:12Z

Purpose

Fix regex targets not being ignored
Fix pydantic type checking to allow lists to be used instead of tuples
Add ChatGLM mappings (which are the same as the bloom mappings)

Issues

Fixes Yaml parsing fails with a custom mapping provided to SmoothQuantModifier recipe #105
Fixes (partially) [Usage] How to manually set calibration_function? #886
Related to Quantize glm-4v-9b with INT8 Quantization #1003

Testing

glm.py

import requests
from PIL import Image
from io import BytesIO

from transformers import AutoProcessor
from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
from llmcompressor.modifiers.smoothquant.utils import BLOOM_SMOOTHQUANT_MAPPINGS
from datasets import load_dataset
from llmcompressor.transformers.tracing import ChatGLMForConditionalGeneration

from llmcompressor.transformers.utils.data_collator import glm_data_collator

MODEL_ID = "THUDM/glm-4v-9b"
model = ChatGLMForConditionalGeneration.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)

NUM_CALIBRATION_SAMPLES = 1 #512
MAX_SEQUENCE_LENGTH = 2048

ds = load_dataset("Lin-Chen/ShareGPT4V", "ShareGPT4V", split=f"train[:{NUM_CALIBRATION_SAMPLES}]")
ds = ds.shuffle(seed=42)

def preprocess(example):
    url_part = "/".join(example["image"].split("/")[1:])
    url = f"http://images.cocodataset.org/{url_part}"
    response = requests.get(url)
    response.raise_for_status()
    image = Image.open(BytesIO(response.content)).convert('RGB')

    return processor.apply_chat_template(
        [
            {
                "role": "user",
                "image": image,
                "content": example["conversations"][0]["value"],
            }
        ],
        add_generation_prompt=True,
        tokenize=True,
        return_tensors="pt",
        return_dict=True,
    )


ds = ds.map(preprocess, remove_columns=ds.column_names)

# Configure the quantization algorithms
recipe = [
    SmoothQuantModifier(
        smoothing_strength=0.8,
        mappings=[
            [["re:.*query_key_value"], "re:.*input_layernorm"],
            [["re:.*dense_h_to_4h"], "re:.*post_attention_layernorm"],
        ],
        ignore=["transformer.output_layer", "re:transformer.vision.*"]
    ),
    #GPTQModifier(
    #    targets="Linear",
    #    scheme="W8A8",
    #    sequential_targets=["GLMBlock"],
    #    ignore=["transformer.output_layer", "re:transformer.vision.*"],
    #),
]

# Apply quantization
oneshot(
    model=model,
    dataset=ds,
    recipe=recipe,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
    trust_remote_code_model=True,
    data_collator=glm_data_collator,
)

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2024-12-27T16:53:25Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

rahul-tuli

Thx!

## Purpose ## * Fix regex targets not being ignored * Fix pydantic type checking to allow lists to be used instead of tuples * Add ChatGLM mappings (which are the same as the bloom mappings) ## Issues ## * Fixes #105 * Fixes (partially) #886 * Related to #1003 ## Testing ## <details><summary>glm.py</summary> ```python3 import requests from PIL import Image from io import BytesIO from transformers import AutoProcessor from llmcompressor.transformers import oneshot from llmcompressor.modifiers.quantization import GPTQModifier from llmcompressor.modifiers.smoothquant import SmoothQuantModifier from llmcompressor.modifiers.smoothquant.utils import BLOOM_SMOOTHQUANT_MAPPINGS from datasets import load_dataset from llmcompressor.transformers.tracing import ChatGLMForConditionalGeneration from llmcompressor.transformers.utils.data_collator import glm_data_collator MODEL_ID = "THUDM/glm-4v-9b" model = ChatGLMForConditionalGeneration.from_pretrained( MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True ) processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True) NUM_CALIBRATION_SAMPLES = 1 #512 MAX_SEQUENCE_LENGTH = 2048 ds = load_dataset("Lin-Chen/ShareGPT4V", "ShareGPT4V", split=f"train[:{NUM_CALIBRATION_SAMPLES}]") ds = ds.shuffle(seed=42) def preprocess(example): url_part = "/".join(example["image"].split("/")[1:]) url = f"http://images.cocodataset.org/{url_part}" response = requests.get(url) response.raise_for_status() image = Image.open(BytesIO(response.content)).convert('RGB') return processor.apply_chat_template( [ { "role": "user", "image": image, "content": example["conversations"][0]["value"], } ], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True, ) ds = ds.map(preprocess, remove_columns=ds.column_names) # Configure the quantization algorithms recipe = [ SmoothQuantModifier( smoothing_strength=0.8, mappings=[ [["re:.*query_key_value"], "re:.*input_layernorm"], [["re:.*dense_h_to_4h"], "re:.*post_attention_layernorm"], ], ignore=["transformer.output_layer", "re:transformer.vision.*"] ), #GPTQModifier( # targets="Linear", # scheme="W8A8", # sequential_targets=["GLMBlock"], # ignore=["transformer.output_layer", "re:transformer.vision.*"], #), ] # Apply quantization oneshot( model=model, dataset=ds, recipe=recipe, max_seq_length=MAX_SEQUENCE_LENGTH, num_calibration_samples=NUM_CALIBRATION_SAMPLES, trust_remote_code_model=True, data_collator=glm_data_collator, ) ``` </details> Signed-off-by: Kyle Sayers <[email protected]>

fix type, glm mapping, fix ignore

ec37204

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs requested a review from rahul-tuli January 1, 2025 19:27

rahul-tuli approved these changes Jan 2, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/smoothquant-ignore-glm

72f5b93

kylesayrs self-assigned this Jan 4, 2025

Merge branch 'main' into kylesayrs/smoothquant-ignore-glm

6d0dfc3

horheynm approved these changes Jan 9, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/smoothquant-ignore-glm

aa84025

kylesayrs mentioned this pull request Jan 10, 2025

[Usage] How to manually set calibration_function? #886

Closed

dsikka merged commit 4d06685 into main Jan 10, 2025
6 of 7 checks passed

dsikka deleted the kylesayrs/smoothquant-ignore-glm branch January 10, 2025 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix smoothquant ignore, Fix typing, Add glm mappings #1015

Fix smoothquant ignore, Fix typing, Add glm mappings #1015

kylesayrs commented Dec 27, 2024 •

edited

Loading

github-actions bot commented Dec 27, 2024

rahul-tuli left a comment

Fix smoothquant ignore, Fix typing, Add glm mappings #1015

Fix smoothquant ignore, Fix typing, Add glm mappings #1015

Conversation

kylesayrs commented Dec 27, 2024 • edited Loading

Purpose

Issues

Testing

github-actions bot commented Dec 27, 2024

rahul-tuli left a comment

Choose a reason for hiding this comment

kylesayrs commented Dec 27, 2024 •

edited

Loading