-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【性能报告】Windows Server下双路9654+双路3090性能汇报 #828
Comments
nps需要设成1,不然频繁的跨socket通信会严重限制速度。内存不插满也是原因之一 |
怎么比我单路7763还拉胯 |
内存不插满对计算带宽影响还挺大的……建议插满哈哈哈 |
We recommend that this kind of report be included in the discussion section. Next time, could you pin this report on it? |
Thank you for your kind suggestion. Sorry for any disturbance caused by my ignorance. Should I move this thread do Discussions section now? |
That's best, but you can leave the current discussion here. We will close it a few days later to remind others. |
Moved, thank you for supporting. ;-) |
首先感谢KT框架的开发者和贡献者。昨天尝试在Windows上编译KT并加载Q4_K_M量化的DeepSeek-R1,交份成绩单。
此讨论串已移动到Discussions 833。
硬件环境:
软件环境:
模型:
python -m ktransformers.local_chat --model_path E:/LLM-Models/DeepSeek-AI/DeepSeek-R1-671b --gguf_path E:/LLM-Models/DeepSeek-AI/DeepSeek-R1-671b/DeepSeek-R1-671b-Q4_K_M/ --optimize_config_path E:/ktransformers/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu.yaml --max_new_tokens 8192 --cpu_infer 190
AIDA64 V7.35.7000内存与缓存测试成绩:
Ollama基线性能:
num_gpu
为4
,卸载了4个层到GPU上。KT性能:
--cpu_infer
改为80
,相同提示词:--optimize_config_path
单卡运行(调用时不设置该参数)时的性能,相同提示词,单卡运行时系统内存占用394 GB,GPU0显存占用10.8 GB,CPU负载73%,GPU0负载100%:调为NPS1的性能:
--cpu_infer
设为190
,使用默认--optimize_config_path
单卡运行,相同提示词:--cpu_infer
设为180
,使用默认--optimize_config_path
单卡运行,相同提示词,观察到NUMA节点0占用约50%,NUMA节点1占用约10%:讨论:
The text was updated successfully, but these errors were encountered: