-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSpeedEngineWrapper.backward() does a bit too much #2951
Comments
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hitting this snag right now actually, will see what we decide to do |
We're working with the DS team to try and remove the |
Somehow
does not give equivalent results to with deepspeed. What gives? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
same question here. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
same question here. It's indeed very hard to understand why it was designed this way. I believe accelerator.backward(loss) should only perform the backward operation, and other steps should be written outside this function in a more standard and understandable manner. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Not stale. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I hope it helps and if it did glad to be of any help to you. Happy Coding! |
So for a bit more context, we've been waiting on this PR to happen, so hopefully we can give more flexibility soon: deepspeedai/DeepSpeed#7018 |
The source code of DeepSpeedEngineWrapper:
My question is: Why do we need to do
self.engine.step()
here immediately? This behavior zeros grad and change the parameter without noticing the user. It might be out of expectation. Since backward step is internally binded with zeroing grad and changing parameter, this blocks users from checking the gradient or parameter manually before stepping.I know deepspeed-wrapped models can't be seen as normal models, but this behavior still elimiates a lot of flexibility.
The text was updated successfully, but these errors were encountered: