-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/add transformers integration #728
Feat/add transformers integration #728
Conversation
Could you rebase your branch on |
Sure thing. Is it correctly understood that this concerns
Sure thing, I'll do that when the llamacpp integration is set up. |
@rlouf The LlamaCpp integration has been moved now, and have also adapted the imports in the |
Will take a look early next week! |
I just fixed merged conflicts due to a big refactor of the |
Can this be used with multimodal models too? |
It should! |
thanks!! Can't wait to try it out!! |
I was able to run the example locally, and tried the vLLM integration as well; llama.cpp is covered by the integration tests. Great job! @Kamakshi8104 would you like to add an example using a multimodal model to the docs? |
Yes I can add an example🙂. I am in the middle of exams so will get started on it this coming week👍 |
Any update on this? Very interested in a seeing a multimodal example, thanks! |
This adds integration with the
transformers
package, by supplying aprefix_allowed_tokens_fn
that can be supplied to bothtransformers
pipelines and directly togenerate
methods of generative models from thetransformers
package.The
transformers
integration has been put in anintegrations
top-level directory, and the existing vLLM integration has also been moved into that directory. We keep a reference in the previousserve/vllm.py
module, to preserve backwards compatibility.One notable difference between vLLM and transformers is that transformers include the input in the generated sequences. Since this can mess up the FSMs, we keep track of the input tokens, and use these both to identify when we change to new samples (resetting the FSM) as well as to know which token ID prefix to remove, before getting the next state from the FSM.
Also added an example with the transformers integration.
Closes #713