-
I understand that Deepspeed supports TP for inference, but it does not support training. |
Beta Was this translation helpful? Give feedback.
Answered by
tjruwase
Oct 10, 2023
Replies: 1 comment 1 reply
-
You are correct, DeepSpeed relies on client-side tensor parallelism, such as Megatron, for training. The following doc provides some details on combining DeepSpeed and Megatron for this: https://huggingface.co/blog/bloom-megatron-deepspeed. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
yiliu30
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You are correct, DeepSpeed relies on client-side tensor parallelism, such as Megatron, for training. The following doc provides some details on combining DeepSpeed and Megatron for this: https://huggingface.co/blog/bloom-megatron-deepspeed.