Switch backend to use llm-compressor #33

mgoin · 2024-07-18T20:56:41Z

Currently only works for static quantization.

from datasets import load_dataset
from transformers import AutoTokenizer

from auto_fp8 import AutoFP8ForCausalLM, BaseQuantizeConfig

pretrained_model_dir = "facebook/opt-125m"
quantized_model_dir = "opt-125m-FP8"

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

ds = load_dataset("mgoin/ultrachat_2k", split="train_sft").select(range(512))
def preprocess(example):
    example = tokenizer.apply_chat_template(example["messages"], tokenize=False)
    return tokenizer(example, max_length=2048, truncation=True, add_special_tokens=False)
ds = ds.map(preprocess, remove_columns=ds.column_names)

quantize_config = BaseQuantizeConfig(quant_method="fp8", activation_scheme="static")

model = AutoFP8ForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
model.quantize(ds)
model.save_quantized(quantized_model_dir)

mgoin changed the base branch from support-kv-cache-scales to main July 18, 2024 21:05

mgoin force-pushed the use-llm-compressor branch 2 times, most recently from c15e352 to b428604 Compare July 18, 2024 21:10

Switch backend with llm-compressor

e286fa9

mgoin force-pushed the use-llm-compressor branch from b428604 to e286fa9 Compare July 18, 2024 21:11

mgoin added 8 commits July 18, 2024 17:12

Remove quantize

6d508ae

Fix test

bbf352f

Add to requirements

ab3dad3

Update example

be6eef2

Fix requirement

b4f830d

Fix test

af8f5a0

Test

3063398

Add support for dynamic activation

3f683f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch backend to use llm-compressor #33

Switch backend to use llm-compressor #33

mgoin commented Jul 18, 2024 •

edited

Loading

Switch backend to use llm-compressor #33

Are you sure you want to change the base?

Switch backend to use llm-compressor #33

Conversation

mgoin commented Jul 18, 2024 • edited Loading

mgoin commented Jul 18, 2024 •

edited

Loading