You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? If so, please describe.
Context:
I am deploying multiple Triton Inference Servers on k8s, each is an API (for example Document OCR, Document Quality Check) and contains multiple models.
Problem: Each POD on K8s must have at least a GPU, which is a NVIDIA A30 MIG with 6 GB VRAM in my case. But an API / Triton Inference Server may only use 1 -> 3 GB VRAM which causes redundant resources.
Therefore, I am considering to use KServe Model Mesh to solve my problem. I expected that I can map my Triton Inference Servers, each with an Inference Service in context of KServe. That means an Inference Service should contains multiple models and now Model Mesh's responsibility is to schedule my API to available Service Runtime. Problem
As far as I know, Triton Inference Server Runtime, which includes a runtime from NVIDIA and an adapter, expects the each Inference Service to have only one model.
That makes some of my logic on Triton such as Ensemble, BLS not able to run.
Describe your proposed solution
First of all, excuse me if this Issue is on the wrong project, I think it should be on the adapter project but I also want to know is there an alternative to Model Mesh to solve my problem. I am new to KServe.
My proposed solution is to make the Inference Service accept multiple models. The benefits of this approach are:
Easy to migrate from Triton to Kserve.
These multiple models are usually tightly coupled, therefore scheduling them on the same server should reduce overhead. In addition, implementing logic should be much more simpler.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? If so, please describe.
Context:
Problem
Describe your proposed solution
First of all, excuse me if this Issue is on the wrong project, I think it should be on the adapter project but I also want to know is there an alternative to Model Mesh to solve my problem. I am new to KServe.
My proposed solution is to make the Inference Service accept multiple models. The benefits of this approach are:
The text was updated successfully, but these errors were encountered: