Does triton inference server do model loading optimization #5984
Unanswered
sfc-gh-zhwang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When loading onnx/pytorch/fastertransformer model, does triton load the model from disk to cpu/memory and to gpu or triton directly load the model to gpu memory?
Beta Was this translation helpful? Give feedback.
All reactions