tokens routing #8

wkml · 2024-03-04T06:34:12Z

thanks for your work! It is very valuable! I would like to know how you got your conclusion about token routing, since input is affected by attention and rope, it is not logical that there should be a fixed routing for each token, how should I reproduce your result about this part?

ilyalasy · 2024-03-08T14:42:32Z

Hey! Not a paper author here, but I'm currently working on reproducing the results of OpenMoe paper specificaly on token routing.
Take a look: https://github.com/Misterion777/moe-routing/blob/main/notebooks/routing_eda.ipynb
Would appreciate any collaboration!

Also would be grateful for a review from paper author @XueFuzhao whether what I'm doing makes sense.

wkml · 2024-03-08T14:44:17Z

Hello, I have received your email.

XueFuzhao · 2024-03-08T15:00:50Z

Thank you for your interest!!
May I know whether your OpenMoE can generate readable sentences?

XueFuzhao · 2024-03-08T15:18:10Z

My analysis code is a bit dirty. But in general, the core code has been attached in this file: https://github.com/XueFuzhao/OpenMoE/blob/main/analysis/colossalai_replace/layer.py
You can just compare the colossalai's class SparseMLP and mine and you will then get the difference.

I went through your code very quickly (sry I'm totally overwhelmed these days), my two concerns:

The context-independent specialization is not that clear? I am not sure whether the output sentence is normal. If not, the model may have some bugs. Maybe the ckpt loading is not that correct? Just to have a check on the model output.
In your hooker code, it seems that you are accounting for the argmax value directly? However, the routing decision depends on both argmax value and model capacity. So a more reliable implementation is to check the routing decision like this line:

OpenMoE/analysis/colossalai_replace/layer.py

Line 189 in ad4c65c

"dispatch_mask": dispatch_mask_np.tolist(),

Thanks again for your interest! Looking forward to your results on other MoE models like Mistral and Deepseek-MoE. That would be very interesting.

wkml · 2024-03-13T12:23:33Z

Hey! Not a paper author here, but I'm currently working on reproducing the results of OpenMoe paper specificaly on token routing. Take a look: https://github.com/Misterion777/moe-experiments/blob/main/notebooks/routing_eda.ipynb Would appreciate any collaboration!

Also would be grateful for a review from paper author @XueFuzhao whether what I'm doing makes sense.

thanks for your code! I have encountered some tricky things recently, so I have spent less energy on advancing this research. I will study your code carefully and thank you for your efforts! Thank you all! @Misterion777 @XueFuzhao

ilyalasy · 2024-03-14T16:13:54Z

My analysis code is a bit dirty. But in general, the core code has been attached in this file: https://github.com/XueFuzhao/OpenMoE/blob/main/analysis/colossalai_replace/layer.py You can just compare the colossalai's class SparseMLP and mine and you will then get the difference.

I went through your code very quickly (sry I'm totally overwhelmed these days), my two concerns:

The context-independent specialization is not that clear? I am not sure whether the output sentence is normal. If not, the model may have some bugs. Maybe the ckpt loading is not that correct? Just to have a check on the model output.

In your hooker code, it seems that you are accounting for the argmax value directly? However, the routing decision depends on both argmax value and model capacity. So a more reliable implementation is to check the routing decision like this line:

OpenMoE/analysis/colossalai_replace/layer.py

Line 189 in ad4c65c

"dispatch_mask": dispatch_mask_np.tolist(),

Thanks again for your interest! Looking forward to your results on other MoE models like Mistral and Deepseek-MoE. That would be very interesting.

I changed the hook - now it takes expert capacity into consideration.
Besides, ColossalAI checkpoint is indeed buggy and doesn't output valid text. I am using OrionZheng/... instead.
Now the plot looks much more similar to what you've reported in the paper.
Thank you very much for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokens routing #8

tokens routing #8

wkml commented Mar 4, 2024

ilyalasy commented Mar 8, 2024 •

edited

Loading

wkml commented Mar 8, 2024 via email

XueFuzhao commented Mar 8, 2024

XueFuzhao commented Mar 8, 2024

wkml commented Mar 13, 2024

ilyalasy commented Mar 14, 2024

tokens routing #8

tokens routing #8

Comments

wkml commented Mar 4, 2024

ilyalasy commented Mar 8, 2024 • edited Loading

wkml commented Mar 8, 2024 via email

XueFuzhao commented Mar 8, 2024

XueFuzhao commented Mar 8, 2024

wkml commented Mar 13, 2024

ilyalasy commented Mar 14, 2024

ilyalasy commented Mar 8, 2024 •

edited

Loading