-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add zero optimizer parallel #593
Conversation
print(f"rank_id: {rank_id}, group_size: {group_size}") | ||
ms.reset_auto_parallel_context() | ||
ms.set_auto_parallel_context( | ||
parallel_mode=ms.ParallelMode.DATA_PARALLEL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the difference between it ParallelMode.SEMI_AUTO_PARALLEL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented a new version of zero optimizer parallel in DATA_PARALLEL refer to DeepSpeed. Not use the MindSpore automatic parallel process.
0e46851
to
c0a8757
Compare
2755198
to
dc2d467
Compare
e4978fe
to
69cce29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: supplement the script for checkpoint merging.
mindone/trainers/train_step.py
Outdated
@@ -104,6 +115,10 @@ def construct(self, *inputs): | |||
|
|||
# 1. compute gradients (of the up-scaled loss w.r.t. the model weights) | |||
grads = self.grad(self.network, weights)(*inputs, scaling_sens_filled) | |||
|
|||
# Gradient communication | |||
grads = self.zero_helper.cal_gradients(grads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.zero_helper
can be None, need to to add a if condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
60c7aaa
to
a19b806
Compare
d9b877e
to
bb21538
Compare
What does this PR do?
参考deepspeed的zero,在MindSpore的数据并行模式下实现优化器并行算法
Fixes # (issue)
Adds # (feature)
Before submitting
What's New
. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx