Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on multi GPUs #7

Open
SnowNation101 opened this issue Feb 10, 2024 · 1 comment
Open

Training on multi GPUs #7

SnowNation101 opened this issue Feb 10, 2024 · 1 comment

Comments

@SnowNation101
Copy link

Dear authors, Happy Chinese New Year!
Can you tell me if the model supports multi-card training? When I try to use the original code to train on a machine with two A100s (using python run.py --args ...), I find that it only trains on one card by default, instead of calling two cards.
If I modify the code to just use torch.nn.DataParallel() to wrap the model for training, will there be any problem? Or is there another more appropriate way? I've tried launching run.py with accelerate launch, but while it automatically calls both cards for training, it gets multi-threading related errors at kmeans step.
Hope you can answer my questions, thank you very much!

@sunnweiwei
Copy link
Owner

Hi,
sorry for the late reply.

I used accelerate launch for multi-GPU training. Yes, the K-means portion does not support multi-GPU, so I switch to a single GPU for the K-means and testing parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants