[Triton] Inference Service with multiple models #514

haiminh2001 · 2024-06-18T10:10:00Z

Is your feature request related to a problem? If so, please describe.

Context:

I am deploying multiple Triton Inference Servers on k8s, each is an API (for example Document OCR, Document Quality Check) and contains multiple models.
Problem: Each POD on K8s must have at least a GPU, which is a NVIDIA A30 MIG with 6 GB VRAM in my case. But an API / Triton Inference Server may only use 1 -> 3 GB VRAM which causes redundant resources.
Therefore, I am considering to use KServe Model Mesh to solve my problem. I expected that I can map my Triton Inference Servers, each with an Inference Service in context of KServe. That means an Inference Service should contains multiple models and now Model Mesh's responsibility is to schedule my API to available Service Runtime.
Problem
As far as I know, Triton Inference Server Runtime, which includes a runtime from NVIDIA and an adapter, expects the each Inference Service to have only one model.
That makes some of my logic on Triton such as Ensemble, BLS not able to run.

Describe your proposed solution
First of all, excuse me if this Issue is on the wrong project, I think it should be on the adapter project but I also want to know is there an alternative to Model Mesh to solve my problem. I am new to KServe.
My proposed solution is to make the Inference Service accept multiple models. The benefits of this approach are:

Easy to migrate from Triton to Kserve.
These multiple models are usually tightly coupled, therefore scheduling them on the same server should reduce overhead. In addition, implementing logic should be much more simpler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Triton] Inference Service with multiple models #514

[Triton] Inference Service with multiple models #514

haiminh2001 commented Jun 18, 2024

[Triton] Inference Service with multiple models #514

[Triton] Inference Service with multiple models #514

Comments

haiminh2001 commented Jun 18, 2024