-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
多gpu运行 #36
Comments
你好,
请先确实pytorch版本,建议暂时不要使用1.10以上,推荐版本,1.7
… 在 2021年11月16日,下午2:18,songfuture ***@***.***> 写道:
首先感谢xmuspeech的subtools工具~
请问一下,当使用命令 subtools/runPytorchLauncher.sh run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --stage=3 --endstage=3 ,也就是 python3 -m torch.distributed.launch --nproc_per_node=2 run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --port 2345 --stage=3 --endstage=3 时,出现如下warning和error,可能是环境还是哪里出现问题导致多卡初始化失败呢?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
谢谢回复,按照您的建议把pytorch版本从1.9降到1.7后,可以顺利运行。再请问一下,多机多卡是以如下命令在多个机子上运行,”python3 -m torch.distributed.launch --nnodes=2 --nproc_per_node=2 --master_addr ***** --node_rank=0 run-resnet152-fbank-81-attention.py --stage=3 --endstage=3“?还是说subtools有单机多卡修改为多机多卡的机制呢?subtools/pytorch/libs/support/utils.py中显示可以easy拓展为多机,但是在运行单机多卡的脚本subtools/runPytorchLauncher.sh中没有发现切换为多机的参数,还是说在其他脚本里进行设置呢? |
您好,请问一下,subtools/pytorch/libs/support/utils.py是只能实现单机多卡吗?多机多卡需要自己改初始化之类的么? |
你好,
torch支持多机多卡,但是我们这个工具暂时没做这方面的开发。多机情况需要额外做多机的数据布置以及修改通信组,可以在torch官网看看相关例子。
祝好!
… 在 2021年11月18日,下午4:38,songfuture ***@***.***> 写道:
您好,请问一下,subtools/pytorch/libs/support/utils.py是只能实现单机多卡吗?多机多卡需要自己改初始化之类的么?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
谢谢答复~ |
首先感谢xmuspeech的subtools工具~
请问一下,当使用命令 subtools/runPytorchLauncher.sh run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --stage=3 --endstage=3 ,也就是 python3 -m torch.distributed.launch --nproc_per_node=2 run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --port 2345 --stage=3 --endstage=3 时,出现如下warning和error,可能是环境还是哪里出现问题导致多卡初始化失败呢?
The text was updated successfully, but these errors were encountered: