-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed finetune qwen32b_awq_int4 using lora with llama-factory #1314
Comments
my last conda env: |
It's wired! I found it can running normally after I add the argument
noticed it patched nothing but it running and exit normally indeed |
I don't think you supposed to finetune further a model that has been quantized with AWQ since AWQ packs and unpacks stuff on their weight -> Now it kinda have different architecture than the original Qwen. But you can use Unsloth's one tho -> https://huggingface.co/unsloth/Qwen2.5-32B-bnb-4bit |
Oh yep for now use the original 16bit weights or bitsandbytes - AWQ has a different quantization pathway |
I want to lora finetune Qwen2.5-32B-Instruct-AWQ model(4bit quant already) through llama-factory, but occured an error.
[INFO|configuration_utils.py:677] 2024-11-21 19:44:25,957 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:25,960 >> Model config Qwen2Config { "_name_or_path": "/home/jovyan/models/Qwen2.5-32B-Instruct-AWQ", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 27648, "max_position_embeddings": 32768, "max_window_layers": 70, "model_type": "qwen2", "num_attention_heads": 40, "num_hidden_layers": 64, "num_key_value_heads": 8, "quantization_config": { "bits": 4, "group_size": 128, "modules_to_not_convert": null, "quant_method": "awq", "version": "gemm", "zero_point": true }, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file vocab.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file merges.txt [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2209] 2024-11-21 19:44:26,772 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2475] 2024-11-21 19:44:26,973 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Converting format of dataset (num_proc=8): 100%|██████████| 50/50 [00:00<00:00, 112.68 examples/s] Running tokenizer on dataset (num_proc=8): 100%|██████████| 50/50 [00:01<00:00, 30.23 examples/s] [INFO|configuration_utils.py:677] 2024-11-21 19:44:30,365 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:30,366 >> Model config Qwen2Config { [INFO|configuration_utils.py:677] 2024-11-21 19:44:36,337 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:36,338 >> Model config Qwen2Config { [INFO|configuration_utils.py:677] 2024-11-21 19:44:47,798 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:47,801 >> Model config Qwen2Config { [WARNING|logging.py:168] 2024-11-21 19:44:47,803 >> Unsloth: /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ can only handle sequence lengths of at most 32768. But with kaiokendev's RoPE scaling of 2.0, it can be magically be extended to 65535! [INFO|configuration_utils.py:677] 2024-11-21 19:44:47,875 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/config.json [INFO|configuration_utils.py:746] 2024-11-21 19:44:47,877 >> Model config Qwen2Config { "max_position_embeddings": 65535, "rope_scaling": { "factor": 1.999969482421875, "type": "linear" [INFO|modeling_utils.py:3934] 2024-11-21 19:44:48,668 >> loading weights file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/model.safetensors.index.json [INFO|modeling_utils.py:1670] 2024-11-21 19:44:48,693 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:1096] 2024-11-21 19:44:48,696 >> Generate config GenerationConfig { Loading checkpoint shards: 100%|██████████| 5/5 [01:55<00:00, 23.19s/it] [INFO|modeling_utils.py:4800] 2024-11-21 19:46:51,741 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM. [INFO|modeling_utils.py:4808] 2024-11-21 19:46:51,741 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. [INFO|configuration_utils.py:1049] 2024-11-21 19:46:51,776 >> loading configuration file /home/jovyan/models/Qwen2.5-32B-Instruct-AWQ/generation_config.json [INFO|configuration_utils.py:1096] 2024-11-21 19:46:51,776 >> Generate config GenerationConfig { "do_sample": true, "eos_token_id": [ 151645, 151643 "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 [WARNING|logging.py:168] 2024-11-21 19:47:12,702 >> Unsloth 2024.10.7 patched 64 layers with 0 QKV layers, 64 O layers and 64 MLP layers. /home/jovyan/xxx/LLaMA-Factory-main/src/llamafactory/train/sft/trainer.py:54: FutureWarning:
tokenizeris deprecated and will be removed in version 5.0.0 for
CustomSeq2SeqTrainer.init. Use
processing_classinstead. super().__init__(**kwargs) Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [INFO|trainer.py:698] 2024-11-21 19:47:14,443 >> Using auto half precision backend [WARNING|<string>:208] 2024-11-21 19:47:19,827 >> ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \\ /| Num examples = 50 | Num Epochs = 1 O^O/ \_/ \ Batch size per device = 1 | Gradient Accumulation steps = 2 \ / Total batch size = 2 | Total steps = 25 "-____-" Number of trainable parameters = 134,217,728 [rank0]: loss = super().compute_loss(model, inputs, return_outputs, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/_utils.py", line 1183, in _unsloth_pre_compute_loss [rank0]: return self._old_compute_loss(model, inputs, *args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/transformers/trainer.py", line 3625, in compute_loss [rank0]: outputs = model(**inputs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward [rank0]: else self._run_ddp_forward(*inputs, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward [rank0]: return self.module(*inputs, **kwargs) # type: ignore[index] [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward [rank0]: return model_forward(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__ [rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs)) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast [rank0]: return func(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner [rank0]: return disable_fn(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 1044, in PeftModelForCausalLM_fast_forward [rank0]: return self.base_model( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 188, in forward [rank0]: return self.model.forward(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 942, in _CausalLM_fast_forward [rank0]: outputs = self.model( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 776, in LlamaModel_fast_forward [rank0]: hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 465, in decorate_fwd [rank0]: return fwd(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/_utils.py", line 807, in forward [rank0]: output = forward_function(hidden_states, *args) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 491, in LlamaDecoderLayer_fast_forward [rank0]: hidden_states, self_attn_weights, present_key_value = self.self_attn( [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/models/llama.py", line 436, in LlamaAttention_fast_forward [rank0]: attn_output = self.apply_o(self, attn_output) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/kernels/fast_lora.py", line 409, in apply_lora_o [rank0]: OW, OW_quant, OA, OB, OS = get_lora_parameters(self.o_proj) [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/unsloth/kernels/utils.py", line 78, in get_lora_parameters [rank0]: W = base_layer.weight [rank0]: File "/home/jovyan/.aip_conda/xxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1931, in __getattr__ [rank0]: raise AttributeError( [rank0]: AttributeError: 'WQLinear_GEMM' object has no attribute 'weight'. Did you mean: 'qweight'?
I tried various version of combinations of transformers+torch+unsloth. So what's the problem here? Or it's not support to lora finetune qwen2_awq_int4 actually. Thanks!
The text was updated successfully, but these errors were encountered: