Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can i server my model with triton inference server #761

Closed
leo-XUKANG opened this issue Dec 3, 2021 · 6 comments
Closed

can i server my model with triton inference server #761

leo-XUKANG opened this issue Dec 3, 2021 · 6 comments
Labels
No Activity question Further information is requested

Comments

@leo-XUKANG
Copy link

❓ Question

What you have already tried

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • PyTorch Version (e.g., 1.0):
  • CPU Architecture:
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, libtorch, source):
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version:
  • CUDA version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@leo-XUKANG leo-XUKANG added the question Further information is requested label Dec 3, 2021
@leo-XUKANG
Copy link
Author

after torch-tensorrt complie

@narendasan
Copy link
Collaborator

We are currently working on integration with Triton's LibTorch backend so this workflow is supported out of the box.

cc: @borisfom

@ShivamShrirao
Copy link

ShivamShrirao commented Dec 13, 2021

Can we serve it with tensorrt backend in Triton ? I tried converting to trt engine using the following code.

trt_ts_engine = torch_tensorrt.ts.convert_method_to_trt_engine(traced_script_module,
                                                               method_name='forward',
                                                               inputs=inputs,
                                                               truncate_long_and_double=True)
with open(f"{OUT_PATH}/model.plan", 'wb') as f:
    f.write(trt_ts_engine)

But I get this "Version tag does not match" error in Triton.

E1213 11:30:16.510379 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::34] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 43, Serialized Engine Version: 0)
E1213 11:30:16.510406 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::75] Error Code 4: Internal Error (Engine deserialization failed.)
E1213 11:30:16.646657 1 model_repository_manager.cc:1186] failed to load 'rbg_720_outer' version 1: Internal: unable to create TensorRT engine

Using:
TensorRT version: 8.0.3.4
Triton Inference Server container image, release 21.11

@Metareflektor
Copy link

Metareflektor commented Dec 17, 2021

Can we serve it with tensorrt backend in Triton ? I tried converting to trt engine using the following code.

trt_ts_engine = torch_tensorrt.ts.convert_method_to_trt_engine(traced_script_module,
                                                               method_name='forward',
                                                               inputs=inputs,
                                                               truncate_long_and_double=True)
with open(f"{OUT_PATH}/model.plan", 'wb') as f:
    f.write(trt_ts_engine)

But I get this "Version tag does not match" error in Triton.

E1213 11:30:16.510379 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::34] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 43, Serialized Engine Version: 0)
E1213 11:30:16.510406 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::75] Error Code 4: Internal Error (Engine deserialization failed.)
E1213 11:30:16.646657 1 model_repository_manager.cc:1186] failed to load 'rbg_720_outer' version 1: Internal: unable to create TensorRT engine

Using: TensorRT version: 8.0.3.4 Triton Inference Server container image, release 21.11

It's working for me for fp32/fp16 with nvcr.io/nvidia/pytorch:21.11-py3 and nvcr.io/nvidia/tritonserver:21.11-py3 images.

@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@lminer
Copy link

lminer commented Sep 12, 2024

Any updates? Is this working yet on triton server?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Activity question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants