-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qwen-14b 模型格式转换OOM #559
Comments
现在是需要。qwen-14b的vocab要比llama2-13b 大很多。 |
@lvhan028 了解了,谢谢! |
请问tokenizer.model是怎么来的?从huggingface下载的模型没有这个文件啊 |
转换的时候加上 --model-format qwen |
@lvhan028 hi,我model_format设置为qwen后,在第36层后,还是oom了啊 |
在 deploy_qwen() 中,get_tensor函数里的.cuda() 去掉试试 |
@lvhan028 可以了,谢谢! |
使用显卡: 四张A10 24G 显存显卡
使用代码: main分支最新代码
转换命令:
python3 -m lmdeploy.serve.turbomind.deploy qwen-14b \ /home/xuyangyang/qwen-14b-chat qwen \ --tokenizer_path /home/xuyangyang/qwen-14b-chat/tokenizer.model \ --tp 4 \ --dst_path /home/xuyangyang/qwen-14b-chat_transformed_tp4
感觉按常理来说,14b模型转换不需要那么多显存吧?之前llama2-13b 是可以正常转换的
The text was updated successfully, but these errors were encountered: