Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多gpu运行 #36

Open
songfuture opened this issue Nov 16, 2021 · 5 comments
Open

多gpu运行 #36

songfuture opened this issue Nov 16, 2021 · 5 comments

Comments

@songfuture
Copy link

songfuture commented Nov 16, 2021

首先感谢xmuspeech的subtools工具~
请问一下,当使用命令 subtools/runPytorchLauncher.sh run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --stage=3 --endstage=3 ,也就是 python3 -m torch.distributed.launch --nproc_per_node=2 run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --port 2345 --stage=3 --endstage=3 时,出现如下warning和error,可能是环境还是哪里出现问题导致多卡初始化失败呢?
image
image

@Snowdar
Copy link
Owner

Snowdar commented Nov 16, 2021 via email

@songfuture
Copy link
Author

你好, 请先确实pytorch版本,建议暂时不要使用1.10以上,推荐版本,1.7

在 2021年11月16日,下午2:18,songfuture @.***> 写道:  首先感谢xmuspeech的subtools工具~ 请问一下,当使用命令 subtools/runPytorchLauncher.sh run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --stage=3 --endstage=3 ,也就是 python3 -m torch.distributed.launch --nproc_per_node=2 run-resnet34-fbank-81-benchmark.py --gpu-id=0,1 --port 2345 --stage=3 --endstage=3 时,出现如下warning和error,可能是环境还是哪里出现问题导致多卡初始化失败呢? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

谢谢回复,按照您的建议把pytorch版本从1.9降到1.7后,可以顺利运行。再请问一下,多机多卡是以如下命令在多个机子上运行,”python3 -m torch.distributed.launch --nnodes=2 --nproc_per_node=2 --master_addr ***** --node_rank=0 run-resnet152-fbank-81-attention.py --stage=3 --endstage=3“?还是说subtools有单机多卡修改为多机多卡的机制呢?subtools/pytorch/libs/support/utils.py中显示可以easy拓展为多机,但是在运行单机多卡的脚本subtools/runPytorchLauncher.sh中没有发现切换为多机的参数,还是说在其他脚本里进行设置呢?
image

@songfuture
Copy link
Author

您好,请问一下,subtools/pytorch/libs/support/utils.py是只能实现单机多卡吗?多机多卡需要自己改初始化之类的么?

@Snowdar
Copy link
Owner

Snowdar commented Nov 18, 2021 via email

@songfuture
Copy link
Author

谢谢答复~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants