Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qwen-14b 模型格式转换OOM #559

Closed
frankxyy opened this issue Oct 16, 2023 · 7 comments
Closed

qwen-14b 模型格式转换OOM #559

frankxyy opened this issue Oct 16, 2023 · 7 comments

Comments

@frankxyy
Copy link

frankxyy commented Oct 16, 2023

使用显卡: 四张A10 24G 显存显卡

使用代码: main分支最新代码

转换命令: python3 -m lmdeploy.serve.turbomind.deploy qwen-14b \ /home/xuyangyang/qwen-14b-chat qwen \ --tokenizer_path /home/xuyangyang/qwen-14b-chat/tokenizer.model \ --tp 4 \ --dst_path /home/xuyangyang/qwen-14b-chat_transformed_tp4

感觉按常理来说,14b模型转换不需要那么多显存吧?之前llama2-13b 是可以正常转换的

@lvhan028
Copy link
Collaborator

lvhan028 commented Oct 16, 2023

现在是需要。qwen-14b的vocab要比llama2-13b 大很多。
#296 做好后,能解决这个问题

@frankxyy
Copy link
Author

@lvhan028 了解了,谢谢!

@leethu2012
Copy link

使用显卡: 四张A10 24G 显存显卡

使用代码: main分支最新代码

转换命令: python3 -m lmdeploy.serve.turbomind.deploy qwen-14b \ /home/xuyangyang/qwen-14b-chat qwen \ --tokenizer_path /home/xuyangyang/qwen-14b-chat/tokenizer.model \ --tp 4 \ --dst_path /home/xuyangyang/qwen-14b-chat_transformed_tp4

感觉按常理来说,14b模型转换不需要那么多显存吧?之前llama2-13b 是可以正常转换的

请问tokenizer.model是怎么来的?从huggingface下载的模型没有这个文件啊

@lvhan028
Copy link
Collaborator

转换的时候加上 --model-format qwen

@frankxyy
Copy link
Author

@lvhan028 hi,我model_format设置为qwen后,在第36层后,还是oom了啊

@lvhan028
Copy link
Collaborator

在 deploy_qwen() 中,get_tensor函数里的.cuda() 去掉试试

@frankxyy
Copy link
Author

@lvhan028 可以了,谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants