New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[QUESTION] Encoder with more TP than the decoder #1200

Open

MlWoo opened this issue Oct 6, 2024 · 0 comments

MlWoo commented Oct 6, 2024

A new model with a heavy module with a light module could be viewd as a t5 model. so the tp of encoder is more than that of the decoder. The tp partition is not allowed in code https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L519. if it is modified, and change the line https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L616 to for x, y in zip(e_ranks, cycle(d_ranks), is it OK for the model, what else should I considerate?

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment