Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Encoder with more TP than the decoder #1200

Open
MlWoo opened this issue Oct 6, 2024 · 0 comments
Open

[QUESTION] Encoder with more TP than the decoder #1200

MlWoo opened this issue Oct 6, 2024 · 0 comments

Comments

@MlWoo
Copy link

MlWoo commented Oct 6, 2024

A new model with a heavy module with a light module could be viewd as a t5 model. so the tp of encoder is more than that of the decoder. The tp partition is not allowed in code https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L519. if it is modified, and change the line https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L616 to for x, y in zip(e_ranks, cycle(d_ranks), is it OK for the model, what else should I considerate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant