-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with accelerate and deepspeed training #331
Comments
Hi @swang99, |
Hi @EikeKohl. Yes, here is my config.yaml
And the ds_config.json:
|
@swang99 was your issue resolved with the latest repo state? If not, could you please share your nvidia-smi output? And you could also try zero optimization stage 3 (https://www.deepspeed.ai/tutorials/zero/#zero-overview) |
I've been having issues trying to distribute training onto multiple GPUs. Even after following this pull request #316, I check the nvidia-smi log and it still shows all the load being on one GPU, whether using deepspeed or accelerate.
Here are the commands I tried to train the actor models:
deepspeed artifacts/main.py artifacts/config/config.yaml --type RL
accelerate launch artifacts/main.py artifacts/config/config.yaml --type RL
Everything else I simply kept as default configuration and did not touch anything else.
Any ideas? Thank you.
The text was updated successfully, but these errors were encountered: