-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
是否会有deepspeed加速训练和推理过程呢? #46
Comments
您是用A100还是T4 啊。A100应该会快不少,我们后面会去研究模型并行,谢谢您的意见- -! |
其实主要是这样的,GLM没有被移到huggingface标准管线,如果移动进去,应该可以用accelerator直接去加速。 我后面想看看别的hf管线内的模型能不能做这个,我觉得您这个讨论挺有意义的。 |
我是用V100 32G的,加载完会大概需要14G的显存,我尝试用了chatglm 的deepspeed支持,但似乎底层代码还不是很支持在luotuo上做inference,加载到多个显存后会报错“同一份数据无法在两个显存加载”。 |
请教一下还有什么加速的方法呢?比如模型量化后是否能够加速推理呢。 |
我今天看到一个很惊人的工作 https://zhuanlan.zhihu.com/p/622754642 High-throughput Generative Inference of Large Language Models with a Single GPU 但是感觉想整合这个的码量不小 |
现在用tuoling摘要,每个对话需要运行15s才能有结果,希望后续能推出多gpu加速后的推理。
The text was updated successfully, but these errors were encountered: