-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for TensorRT-LLM #632
Comments
Yes waiting for this as welll |
TensorRT-LLM supports logits processors so it should be possible to integrate. How are you hoping to use it? Are you seeking |
I'm looking at something similar to outlines.models.tensorrt right now as my usecase is mostly offline batched inference. Could you give me a starting point to how I can build this out, I'm eager to contribute and add such a feature. |
@SupreethRao99 glad to hear you're interested in contributing. I think a good starting point is looking into how TensorRT handles performs generation and handles Then I'd review how Please let me know if you have any questions! |
Thank you for the resources , I'll definitely get back to you with questions after going through these links. Thanks! |
Related to #655 |
This can likely be implemented with the Executor API: |
Outlines currently support the vLLM inference engine, it would be great if it could also support the tensorRT-LLM inference engine.
The text was updated successfully, but these errors were encountered: