Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ask for Multi-GPU Training #2

Open
gyuwonchoi opened this issue Apr 24, 2023 · 1 comment
Open

Ask for Multi-GPU Training #2

gyuwonchoi opened this issue Apr 24, 2023 · 1 comment

Comments

@gyuwonchoi
Copy link

gyuwonchoi commented Apr 24, 2023

Hi,
Thank you for sharing the code of your work.

While reviewing the './tools/train.py' script, I noticed that the multi-GPU mode is not supported.

I was wondering if there is an alternative way for me to train the code using MMDistributedDataParallel.
I have NVIDIA TITAN V (12GB) GPUs, which cannot train the model based on Transformer in single GPU.

 if args.gpus is not None:
        cfg.gpu_ids = range(4)    
        warnings.warn('`--gpus` is deprecated because we only support '
                      'single GPU mode in non-distributed training. '
                      'Use `gpus=1` now.')
    if args.gpu_ids is not None:
        cfg.gpu_ids = args.gpu_ids[0:3]
        warnings.warn('`--gpu-ids` is deprecated, please use `--gpu-id`. '
                      'Because we only support single GPU mode in '
                      'non-distributed training. Use the first GPU '
                      'in `gpu_ids` now.')

Thank you for response in advance.

@KiwiXR
Copy link
Collaborator

KiwiXR commented Apr 25, 2023

Hi gyuwonchoi,
Thanks for your interest in our work!

We haven't tried training on multiple GPUs, but I assume a simple answer is yes.

We base our method on MMSegmentation, and here is a documentation from it on how to train on multiple GPUs.

Since we use run_experiments.py in place of tools/train.py, there should be some difference in using, e.g., the entry point in tools/dist_train.sh should change accordingly.

Also, modification might be made to samples_per_gpu (e.g., from 2 to 1 for two GPUs) to keep the training batch size(2 source + 2 target = 4 in total). Explanations could be found here.

Apologies for not having time to delve into this now. Any feedbacks are welcome if you are willing to try it out!

Best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants