Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多卡训练 #101

Open
sususama opened this issue Dec 1, 2023 · 2 comments
Open

多卡训练 #101

sususama opened this issue Dec 1, 2023 · 2 comments

Comments

@sususama
Copy link

sususama commented Dec 1, 2023

请问怎么多卡训练,我看到 distributed.py 中只设置了一张卡,我将它设置为[0,1,2,3]的时候直接报错

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

@hfj-cc
Copy link

hfj-cc commented Dec 11, 2023

我理解的,distributed.py只是对运行的线程数进行设置,实际上在dist_model = nn.DataParallel(model).cuda()中就已经设置了多卡训练。

@hfj-cc
Copy link

hfj-cc commented Dec 11, 2023

代码只使用了一个进程,会对训练时间有影响吗?另外多张卡训练,batch_size应该设置多大呢?我在4张1080Ti上训练,设置batch_size=96,训练时间要50多个小时。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants