Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for TensorRT-LLM #632

Open
SupreethRao99 opened this issue Feb 11, 2024 · 7 comments
Open

Support for TensorRT-LLM #632

SupreethRao99 opened this issue Feb 11, 2024 · 7 comments

Comments

@SupreethRao99
Copy link

Outlines currently support the vLLM inference engine, it would be great if it could also support the tensorRT-LLM inference engine.

@teis-e
Copy link

teis-e commented Feb 13, 2024

Yes waiting for this as welll

@lapp0
Copy link
Contributor

lapp0 commented Feb 13, 2024

TensorRT-LLM supports logits processors so it should be possible to integrate.

How are you hoping to use it? Are you seeking outlines.models.tensorrt, a serve endpoint, or both?

@SupreethRao99
Copy link
Author

I'm looking at something similar to outlines.models.tensorrt right now as my usecase is mostly offline batched inference. Could you give me a starting point to how I can build this out, I'm eager to contribute and add such a feature.

@lapp0
Copy link
Contributor

lapp0 commented Feb 14, 2024

@SupreethRao99 glad to hear you're interested in contributing.

I think a good starting point is looking into how TensorRT handles performs generation and handles LogitsProcessors

https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/tensorrt_llm/runtime/generation.py#L368-L386

https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/tensorrt_llm/runtime/model_runner_cpp.py#L266-L267

Then I'd review how llamacpp is being implemented, it shares similarities with how TensorRT would work #556 Specifically llamacpp.py https://github.com/dtiarks/outlines/blob/726ec242fb1695c5a67d489689be13ac84ef472c/outlines/models/llamacpp.py

Please let me know if you have any questions!

@SupreethRao99
Copy link
Author

Thank you for the resources , I'll definitely get back to you with questions after going through these links.

Thanks!

@rlouf
Copy link
Member

rlouf commented Feb 14, 2024

Related to #655

@user-0a
Copy link

user-0a commented Sep 16, 2024

This can likely be implemented with the Executor API:
https://github.com/NVIDIA/TensorRT-LLM/blob/31ac30e928a2db795799fdcab6be446bfa3a3998/examples/cpp/executor/README.md#L4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants