-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel] Add auto parallel moe layer #9886
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (14.18%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #9886 +/- ##
===========================================
- Coverage 51.18% 50.91% -0.28%
===========================================
Files 745 748 +3
Lines 119016 119811 +795
===========================================
+ Hits 60924 60998 +74
- Misses 58092 58813 +721 ☔ View full report in Codecov by Sentry. |
me = paddle.stack(me_list).mean(0) | ||
ce = paddle.stack(ce_list).mean(0) | ||
aux_loss = paddle.sum(me * ce) * float(self.num_experts) | ||
return aux_loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Make sure the capacity value does not exceed the number of tokens. | ||
capacity = int(min(new_capacity, paddle.tensor(mask1.size(0)))) | ||
|
||
l_aux = self._cal_aux_loss(gates, mask1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是没修改?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文件里有添加,可能这里没显示
8bfa877
to
cbef5c5
Compare
cbef5c5
to
143c203
Compare
return mesh | ||
|
||
|
||
def einsum(rule, a, b): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle现在不支持这些功能吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
einsum op 直接动转静有些问题
@@ -117,13 +117,13 @@ def scaled_dot_product_attention( | |||
) | |||
|
|||
if isinstance(outputs, tuple): | |||
outputs[0] = outputs[0].reshape([bsz, q_len, v_num_heads, head_dim]) | |||
outputs[0] = outputs[0].reshape([bsz, kv_seq_len, v_num_heads, head_dim]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里改成q_len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. 这个函数里其他不一致的地方也改了
# Make sure the capacity value does not exceed the number of tokens. | ||
capacity = int(min(new_capacity, paddle.tensor(mask1.size(0)))) | ||
|
||
l_aux = self._cal_aux_loss(gates, mask1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是没修改?
PR types
New features
PR changes
Models
Description
Add auto parallel moe layer