`load_in_8bit_fp32_cpu_offload=True #39

thibaudart · 2023-04-18T16:09:44Z

Any idea how to solve this:

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

I have 48gb of vram the GPU RAM must be enough!

The text was updated successfully, but these errors were encountered:

TsuTikgiau · 2023-04-18T19:51:32Z

48GPU ram should be enough for the demo without the 8bit. Can you set the low_resource to False in eval_configs/minigpt4_eval.yaml and check whether you still have this issue?

vrunm · 2023-05-02T05:20:50Z

I have followed the code given in the huggingface docs:

device_map = {
    "transformer.word_embeddings": 0,
    "transformer.word_embeddings_layernorm": 0,
    "lm_head": "cpu",
    "transformer.h": 0,
    "transformer.ln_f": 0,
}

quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)

model = AutoModelForCausalLM.from_pretrained("AlekseyKorshuk/vicuna-7b",device_map='auto', quantization_config=quantization_config)
tokenizer = AutoTokenizer.from_pretrained("AlekseyKorshuk/vicuna-7b")

Getting this error

TypeError: __init__() got an unexpected keyword argument 
'load_in_8bit_fp32_cpu_offload'

ryzn0518 · 2023-05-29T03:29:57Z

try this:

model = AutoModelForCausalLM.from_pretrained("AlekseyKorshuk/vicuna-7b",device_map=device_map, quantization_config=quantization_config)

mirajdeepbhandari · 2024-03-10T16:45:40Z

i solve that error like this you can do it same for your model

Load model and tokenizer

quantization_config = BitsAndBytesConfig(load_in_8bit_fp32_cpu_offload=True)

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1", quantization_config=quantization_config)
model = PeftModel.from_pretrained(model, "mirajbhandari/mistral-7b-chat-finetune", device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("mirajbhandari/mistral-7b-chat-finetune")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`load_in_8bit_fp32_cpu_offload=True #39

`load_in_8bit_fp32_cpu_offload=True #39

thibaudart commented Apr 18, 2023

TsuTikgiau commented Apr 18, 2023

vrunm commented May 2, 2023

ryzn0518 commented May 29, 2023

mirajdeepbhandari commented Mar 10, 2024 •

edited

Loading

`load_in_8bit_fp32_cpu_offload=True #39

`load_in_8bit_fp32_cpu_offload=True #39

Comments

thibaudart commented Apr 18, 2023

TsuTikgiau commented Apr 18, 2023

vrunm commented May 2, 2023

ryzn0518 commented May 29, 2023

mirajdeepbhandari commented Mar 10, 2024 • edited Loading

Load model and tokenizer

mirajdeepbhandari commented Mar 10, 2024 •

edited

Loading