-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MoE #244
Comments
deepspeed采用G-sharded对MoE的实现 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
MoE(Mixture-Of-Experts, 混合专家系统),在不增加计算量的情况下增加模型容量。采用的技术是Conditional computation,通过加入可训练的门控网络,决定专家系统的稀疏组合。直观看来,就是把一个大模型,按层拆分成不同的小模型组合,在输入样本时,动态地选择对应的小模型计算。
使用SPARSELY-GATE机制来选择模型,MoE包含一个门控网络决定激活哪些层。
The text was updated successfully, but these errors were encountered: