-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokens routing #8
Comments
Hey! Not a paper author here, but I'm currently working on reproducing the results of OpenMoe paper specificaly on token routing. Also would be grateful for a review from paper author @XueFuzhao whether what I'm doing makes sense. |
Hello, I have received your email.
|
Thank you for your interest!! |
My analysis code is a bit dirty. But in general, the core code has been attached in this file: https://github.com/XueFuzhao/OpenMoE/blob/main/analysis/colossalai_replace/layer.py I went through your code very quickly (sry I'm totally overwhelmed these days), my two concerns:
Thanks again for your interest! Looking forward to your results on other MoE models like Mistral and Deepseek-MoE. That would be very interesting. |
thanks for your code! I have encountered some tricky things recently, so I have spent less energy on advancing this research. I will study your code carefully and thank you for your efforts! Thank you all! @Misterion777 @XueFuzhao |
I changed the hook - now it takes expert capacity into consideration. |
thanks for your work! It is very valuable! I would like to know how you got your conclusion about token routing, since input is affected by attention and rope, it is not logical that there should be a fixed routing for each token, how should I reproduce your result about this part?
The text was updated successfully, but these errors were encountered: