Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed finetune qwen32b_awq_int4 using lora with llama-factory #1314

Open
Daya-Jin opened this issue Nov 21, 2024 · 5 comments
Open

failed finetune qwen32b_awq_int4 using lora with llama-factory #1314

Daya-Jin opened this issue Nov 21, 2024 · 5 comments

Comments

@Daya-Jin
Copy link

Daya-Jin commented Nov 21, 2024

I want to lora finetune Qwen2.5-32B-Instruct-AWQ model(4bit quant already) through llama-factory, but occured an error.

[INFO|configuration_utils.py:677] 2024-11-21 19:44:25,957 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:25,960 >> Model config Qwen2Config { "_name_or_path": "/home/jovyan/models/Qwen2.5-32B-Instruct-AWQ", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 27648, "max_position_embeddings": 32768, "max_window_layers": 70, "model_type": "qwen2", "num_attention_heads": 40, "num_hidden_layers": 64, "num_key_value_heads": 8, "quantization_config": { "bits": 4, "group_size": 128, "modules_to_not_convert": null, "quant_method": "awq", "version": "gemm", "zero_point": true }, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file vocab.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file merges.txt [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2475] 2024-11-21 19:44:26,973 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Converting format of dataset (num_proc=8): 100%|██████████| 50/50 [00:00<00:00, 112.68 examples/s] Running tokenizer on dataset (num_proc=8): 100%|██████████| 50/50 [00:01<00:00, 30.23 examples/s] [INFO|configuration_utils.py:677] 2024-11-21 19:44:30,365 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:30,366 >> Model config Qwen2Config { [INFO|configuration_utils.py:677] 2024-11-21 19:44:36,337 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:36,338 >> Model config Qwen2Config { [INFO|configuration_utils.py:677] 2024-11-21 19:44:47,798 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:47,801 >> Model config Qwen2Config { [WARNING|logging.py:168] 2024-11-21 19:44:47,803 >> Unsloth: /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ can only handle sequence lengths of at most 32768. But with kaiokendev's RoPE scaling of 2.0, it can be magically be extended to 65535! [INFO|configuration_utils.py:677] 2024-11-21 19:44:47,875 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:47,877 >> Model config Qwen2Config { "max_position_embeddings": 65535, "rope_scaling": { "factor": 1.999969482421875, "type": "linear" [INFO|modeling_utils.py:3934] 2024-11-21 19:44:48,668 >> loading weights file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/model.safetensors.index.json [INFO|modeling_utils.py:1670] 2024-11-21 19:44:48,693 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:1096] 2024-11-21 19:44:48,696 >> Generate config GenerationConfig { Loading checkpoint shards: 100%|██████████| 5/5 [01:55<00:00, 23.19s/it] [INFO|modeling_utils.py:4800] 2024-11-21 19:46:51,741 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM. [INFO|modeling_utils.py:4808] 2024-11-21 19:46:51,741 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. [INFO|configuration_utils.py:1049] 2024-11-21 19:46:51,776 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/generation_config.json [INFO|configuration_utils.py:1096] 2024-11-21 19:46:51,776 >> Generate config GenerationConfig { "do_sample": true, "eos_token_id": [ 151645, 151643 "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 [WARNING|logging.py:168] 2024-11-21 19:47:12,702 >> Unsloth 2024.10.7 patched 64 layers with 0 QKV layers, 64 O layers and 64 MLP layers. /home/jovyan/xxx/LLaMA-Factory-main/src/llamafactory/train/sft/trainer.py:54: FutureWarning:tokenizeris deprecated and will be removed in version 5.0.0 forCustomSeq2SeqTrainer.init. Use processing_classinstead. super().__init__(**kwargs) Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [INFO|trainer.py:698] 2024-11-21 19:47:14,443 >> Using auto half precision backend [WARNING|<string>:208] 2024-11-21 19:47:19,827 >> ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \\ /| Num examples = 50 | Num Epochs = 1 O^O/ \_/ \ Batch size per device = 1 | Gradient Accumulation steps = 2 \ / Total batch size = 2 | Total steps = 25 "-____-" Number of trainable parameters = 134,217,728 [rank0]: loss = super().compute_loss(model, inputs, return_outputs, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/_utils.py", line 1183, in _unsloth_pre_compute_loss [rank0]: return self._old_compute_loss(model, inputs, *args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/transformers/trainer.py", line 3625, in compute_loss [rank0]: outputs = model(**inputs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward [rank0]: else self._run_ddp_forward(*inputs, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward [rank0]: return self.module(*inputs, **kwargs) # type: ignore[index] [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward [rank0]: return model_forward(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__ [rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs)) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast [rank0]: return func(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner [rank0]: return disable_fn(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 1044, in PeftModelForCausalLM_fast_forward [rank0]: return self.base_model( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 188, in forward [rank0]: return self.model.forward(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 942, in _CausalLM_fast_forward [rank0]: outputs = self.model( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 776, in LlamaModel_fast_forward [rank0]: hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 465, in decorate_fwd [rank0]: return fwd(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/_utils.py", line 807, in forward [rank0]: output = forward_function(hidden_states, *args) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 491, in LlamaDecoderLayer_fast_forward [rank0]: hidden_states, self_attn_weights, present_key_value = self.self_attn( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 436, in LlamaAttention_fast_forward [rank0]: attn_output = self.apply_o(self, attn_output) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/kernels/fast_lora.py", line 409, in apply_lora_o [rank0]: OW, OW_quant, OA, OB, OS = get_lora_parameters(self.o_proj) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/kernels/utils.py", line 78, in get_lora_parameters [rank0]: W = base_layer.weight [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1931, in __getattr__ [rank0]: raise AttributeError( [rank0]: AttributeError: 'WQLinear_GEMM' object has no attribute 'weight'. Did you mean: 'qweight'?

I tried various version of combinations of transformers+torch+unsloth. So what's the problem here? Or it's not support to lora finetune qwen2_awq_int4 actually. Thanks!

@Daya-Jin
Copy link
Author

my last conda env:
torch 2.5.0
transformers 4.46.1
unsloth 2024.10.7

@Daya-Jin Daya-Jin changed the title failed finetune qwen32b_awq_int4 using lora failed finetune qwen32b_awq_int4 using lora with llama-factory Nov 21, 2024
@Daya-Jin
Copy link
Author

It's wired! I found it can running normally after I add the argument --lora_dropout 0.05 except a performance warning.

[WARNING|logging.py:168] 2024-11-21 20:15:52,531 >> Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05. Unsloth will patch all other layers, except LoRA matrices, causing a performance hit. [WARNING|logging.py:168] 2024-11-21 20:16:12,201 >> Unsloth 2024.10.7 patched 64 layers with 0 QKV layers, 0 O layers and 0 MLP layers.

noticed it patched nothing but it running and exit normally indeed

@Daya-Jin
Copy link
Author

Alright I repeat this error in notebook. So I suppose this feature is not supported yet.
image

@Erland366
Copy link
Contributor

I don't think you supposed to finetune further a model that has been quantized with AWQ since AWQ packs and unpacks stuff on their weight -> Now it kinda have different architecture than the original Qwen.

But you can use Unsloth's one tho -> https://huggingface.co/unsloth/Qwen2.5-32B-bnb-4bit

@danielhanchen
Copy link
Contributor

Oh yep for now use the original 16bit weights or bitsandbytes - AWQ has a different quantization pathway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants