Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5-0.5B-Instruct 8bit量化 动态加载LoRA精度下降严重 #3098

Open
jfduma opened this issue Nov 24, 2024 · 3 comments
Open

Qwen2.5-0.5B-Instruct 8bit量化 动态加载LoRA精度下降严重 #3098

jfduma opened this issue Nov 24, 2024 · 3 comments
Labels
question Further information is requested

Comments

@jfduma
Copy link

jfduma commented Nov 24, 2024

实践中以Qwen2.5-0.5B-Instruct为基础模型,针对不同任务训练多个LoRA模型。训练工具为llamafactory-cli
训练后将基础模型进行8bit量化并导出mnn模型,并将多个LoRA模型导出为mnn格式。
在设备上进行验证【cpu + fp16】,发现准确率比量化前下降约15%~40%不等。

尝试过使用MNNConvert --fp16格式导出模型,但是使用apply-lora.py脚本时失败了。失败信息如下:
File "/byte_auto_model/jiangfeng/mnn/tools/script/apply_lora.py", line 72, in apply_lora
tag = names[1].split('.')[1] + names[3]
IndexError: list index out of range
(Pdb) names
['', 'q_proj', 'Add_output_0__matmul_converted']

也尝试过直接使用GPTQ的8bit量化版本进行LoRA sft然后转mnn格式,准确率没有提升。

这里想请教两个问题:

  1. 如何导出fp16格式的基础模型和LoRA模型?
  2. Qwen2.5+LoRA模型转换成8bit量化mnn格式【不合并基模和LoRA,因为需要动态加载多个LoRA】后能否对LoRA进行继续训练?如何操作?
@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 27, 2024

使用 cpu + fp32 试下

  1. 可以先导 onnx 再转 mnn
  2. 目前不支持对 lora 进行训练。可以自行重新训练后再转 mnn

@jxt1234 jxt1234 added the question Further information is requested label Nov 27, 2024
@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 27, 2024

8bit量化建议 llmexport 时先 --export onnx ,再由 onnx 转 mnn 。目前直接导出 mnn 的话 8bit 量化好像有点问题。

@jfduma
Copy link
Author

jfduma commented Nov 27, 2024

谢谢,我试一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants