[Feature] ADD Support for DeepSeek-V2-Chat #32

Xu-Chen · 2024-07-18T02:53:45Z

OOM occurs when quantifying DeepSeek model on 8XA800。
The code used comes from #29

from datasets import load_dataset
from transformers import AutoTokenizer
from auto_fp8 import AutoFP8ForCausalLM, BaseQuantizeConfig

pretrained_model_dir = "/path-to-models/DeepSeek-Coder-V2-Lite-Instruct"
quantized_model_dir = "/path-to-models/DeepSeek-Coder-V2-Lite-Instruct-FP8"

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

# Load and tokenize 512 dataset samples for calibration of activation scales
ds = load_dataset("mgoin/ultrachat_2k", split="train_sft").select(range(512))
examples = [tokenizer.apply_chat_template(batch["messages"], tokenize=False) for batch in ds]
examples = tokenizer(examples, padding=True, truncation=True, return_tensors="pt").to("cuda")

# Define quantization config with static activation scales
quantize_config = BaseQuantizeConfig(
    quant_method="fp8", 
    activation_scheme="static",
    # skip the lm head and expert gate
    ignore_patterns=["re:.*lm_head", "re:.*gate.weight"])

# Load the model, quantize, and save checkpoint
model = AutoFP8ForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
model.quantize(examples)
model.save_quantized(quantized_model_dir)

Is there any way to quantify such a large model?

The text was updated successfully, but these errors were encountered:

Xu-Chen · 2024-07-18T03:30:05Z

Try the following code, it worked for me

quantize_config = BaseQuantizeConfig(
    quant_method="fp8", 
    activation_scheme="dynamic",
    # skip the lm head and expert gate
    ignore_patterns=["re:.*lm_head", "re:.*gate.weight"])


max_memory = {i: "75GB" for i in range(8)}
model = AutoFP8ForCausalLM.from_pretrained(pretrained_model_dir, quantize_config,device_map="sequential", trust_remote_code=True, torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="flash_attention_2")

model.quantize([])

Xu-Chen closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] ADD Support for DeepSeek-V2-Chat #32

[Feature] ADD Support for DeepSeek-V2-Chat #32

Xu-Chen commented Jul 18, 2024 •

edited

Loading

Xu-Chen commented Jul 18, 2024 •

edited

Loading

[Feature] ADD Support for DeepSeek-V2-Chat #32

[Feature] ADD Support for DeepSeek-V2-Chat #32

Comments

Xu-Chen commented Jul 18, 2024 • edited Loading

Xu-Chen commented Jul 18, 2024 • edited Loading

Xu-Chen commented Jul 18, 2024 •

edited

Loading

Xu-Chen commented Jul 18, 2024 •

edited

Loading