-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VLLM batch inference example is broken #536
Comments
We reverted a recent change that modified the vLLM integration. Could you retry this on the current version of |
Hey @rlouf - this still appears to be broken. I'm getting Have you merged the above change into the most recent release? |
I got this to work by explicitly calling the I also needed to include the |
@sethkimmel3 could you please create an issue and provide a reproduction script? I think the docstring below is inaccurate, it actually seems to take an |
Describe the issue as clearly as possible:
The batch inference vllm examples don't run due to AttributeError returning.
In vllm 0.2.7 (also same w/ 0.2.6) there seems to be two interfaces. One is vllm.LLM and the other is the AsyncLLMEngine. Only AsyncLLMEngine has a
tokenizer
accessor which RegexLogitsProcessor and CFGLogitsProcessor assume.To get batch inference to work I modify both of the above processors with something like this:
Steps/code to reproduce the bug:
Expected result:
Normal execution, not a RuntimeError on accessor doesn't exist
Error message:
Outlines/Python version information:
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: