Support for TensorRT-LLM #632

SupreethRao99 · 2024-02-11T03:47:02Z

Outlines currently support the vLLM inference engine, it would be great if it could also support the tensorRT-LLM inference engine.

teis-e · 2024-02-13T14:10:06Z

Yes waiting for this as welll

lapp0 · 2024-02-13T16:47:57Z

TensorRT-LLM supports logits processors so it should be possible to integrate.

How are you hoping to use it? Are you seeking outlines.models.tensorrt, a serve endpoint, or both?

SupreethRao99 · 2024-02-14T03:29:35Z

I'm looking at something similar to outlines.models.tensorrt right now as my usecase is mostly offline batched inference. Could you give me a starting point to how I can build this out, I'm eager to contribute and add such a feature.

lapp0 · 2024-02-14T03:58:08Z

@SupreethRao99 glad to hear you're interested in contributing.

I think a good starting point is looking into how TensorRT handles performs generation and handles LogitsProcessors

https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/tensorrt_llm/runtime/generation.py#L368-L386

https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/tensorrt_llm/runtime/model_runner_cpp.py#L266-L267

Then I'd review how llamacpp is being implemented, it shares similarities with how TensorRT would work #556 Specifically llamacpp.py https://github.com/dtiarks/outlines/blob/726ec242fb1695c5a67d489689be13ac84ef472c/outlines/models/llamacpp.py

Please let me know if you have any questions!

SupreethRao99 · 2024-02-14T04:25:10Z

Thank you for the resources , I'll definitely get back to you with questions after going through these links.

Thanks!

rlouf · 2024-02-14T08:32:55Z

Related to #655

user-0a · 2024-09-16T17:37:31Z

This can likely be implemented with the Executor API:
https://github.com/NVIDIA/TensorRT-LLM/blob/31ac30e928a2db795799fdcab6be446bfa3a3998/examples/cpp/executor/README.md#L4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for TensorRT-LLM #632

Support for TensorRT-LLM #632

SupreethRao99 commented Feb 11, 2024

teis-e commented Feb 13, 2024

lapp0 commented Feb 13, 2024

SupreethRao99 commented Feb 14, 2024

lapp0 commented Feb 14, 2024

SupreethRao99 commented Feb 14, 2024

rlouf commented Feb 14, 2024

user-0a commented Sep 16, 2024

Support for TensorRT-LLM #632

Support for TensorRT-LLM #632

Comments

SupreethRao99 commented Feb 11, 2024

teis-e commented Feb 13, 2024

lapp0 commented Feb 13, 2024

SupreethRao99 commented Feb 14, 2024

lapp0 commented Feb 14, 2024

SupreethRao99 commented Feb 14, 2024

rlouf commented Feb 14, 2024

user-0a commented Sep 16, 2024