Training on multi GPUs #7

SnowNation101 · 2024-02-10T06:37:03Z

Dear authors, Happy Chinese New Year!
Can you tell me if the model supports multi-card training? When I try to use the original code to train on a machine with two A100s (using python run.py --args ...), I find that it only trains on one card by default, instead of calling two cards.
If I modify the code to just use torch.nn.DataParallel() to wrap the model for training, will there be any problem? Or is there another more appropriate way? I've tried launching run.py with accelerate launch, but while it automatically calls both cards for training, it gets multi-threading related errors at kmeans step.
Hope you can answer my questions, thank you very much!

The text was updated successfully, but these errors were encountered:

sunnweiwei · 2024-04-09T09:20:30Z

Hi,
sorry for the late reply.

I used accelerate launch for multi-GPU training. Yes, the K-means portion does not support multi-GPU, so I switch to a single GPU for the K-means and testing parts.

cgr71ii mentioned this issue Apr 26, 2024

Possible training bugs #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on multi GPUs #7

Training on multi GPUs #7

SnowNation101 commented Feb 10, 2024

sunnweiwei commented Apr 9, 2024

Training on multi GPUs #7

Training on multi GPUs #7

Comments

SnowNation101 commented Feb 10, 2024

sunnweiwei commented Apr 9, 2024