Replies: 2 comments
-
Deepspeed-inference includes two features: model parallelism and kernel fusion. In the case of kernel fusion, it does not cause much problem with deployment because it is a simple module replacement. However, since model parallelization uses multiple processes, it should be implemented differently. Regarding model parallel deployment, there are two implementation methods.
If this function is not implemented in DeepSpeed, you can implement it yourself referring to the above implementations. |
Beta Was this translation helpful? Give feedback.
-
Hi @vdantu, @hyunwoongko, We are developing a new feature to access DeepSpeed Inference engine via RESTful APIs and just submitted a PR. Any feedback is welcome. |
Beta Was this translation helpful? Give feedback.
-
Is there a recommended way to integrate deepspeed-inference with gunicorn/flask based framework?
Beta Was this translation helpful? Give feedback.
All reactions