-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration with Hugging Face transformers
#713
Comments
We have a very good reason to not use the |
Oh, I see, how come? And my use case is part of a larger evaluation framework, where one of the tasks require structured generation. I use vLLM but fall back to transformers if the model architecture is not supported by vLLM. It would thus be quite convenient to plug in support in the existing code, rather than work with a new model abstraction. The lmfe structured generation package also feature a convenience function to allow integration with transformers, so I thought you wouldn't mind integrating into that as well? |
Among other things we need to implementing more sampling algorithms than
How do you currently use vLLM in your code? |
I see. For my case, I would either need to use
I wrap vLLM models in a I've now added my own wrapper for the Does that make sense? If this seems to out of scope for |
It should via
Would a |
@rlouf Just looked a bit more in the code base. If the generation speed is at least as fast as with But it sounds like there's no interest from your side to integrate into |
As far as I know, yes.
I'm not sure what you mean here. |
I just mean to use the API from Hugging Face transformers, but plug in your structured generation functionality. So without using your Transformer abstraction or anything like that. Literally do everything exactly as normally with transformers, but plug in support for structured generation from outlines. That's what I love about your vLLM integration, as that's precisely what you do: You allow me to continue working with the vLLM API, and all I have to change in my code is import your JSONLogitsProcessor and plug it into my vLLM config. Two lines of code changed in the code base. But to do that for Hugging Face transformers is not as easy, since your suggestion was essentially to go from using their API to your custom outlines API. If you had a simple JSONPrefixAllowedFn class (or something like that), then I could do the same as I do with vLLM: Simply import it and plug it into my generate call. Two lines of code changed, and it just works. Using your outlines.generate could also work, but in that case I'd love to just put in my (Hugging Face) model and put in the exact same arguments that I would put into a generate call. This would just require a lot more maintenance from your side though, since Hugging Face change their API all the time. That's why I think viewing outlines as a "plug-in", as described above, seems a lot more robust in the long run. |
I would agree if we were not working on features that are not available in transformers, such as #667 #657 #673. We can still integrate the way you suggest with transformers, via logits processors. A middle ground solution would be to add |
Yeah I completely get that. Down the line this might very well mean that
Yeah I think this is exactly what I would like! The analogy of logits processors in transformers are these prefix allowed tokens functions, but they are very similar to the logits processors. I could add a PR that adds this. Where abouts in the code base would fit? |
Yes. I'm not sure where it would make the most sense, maybe in a high-level module |
Sounds good to me. Should I move the vLLM processors as well in that case? I could keep a reference to them in |
Yes that would be great, thank you! |
PR open now: #728 |
Presentation of the new feature
It should be possible to use the
transformers
package for inference of generative models, and simply add structured generation fromoutlines
as a "plugin" rather than needing to wrap all models inoutlines
-specific classes, as it seems like is the current approach.Instead,
transformers
supports aprefix_allowed_tokens_fn
argument in thegenerate
method, which is a function that returns the allowed tokens to be generated at a given step. Theoutlines
package could thus have a simple function/class, which can be given as this argument, analogous to the current vLLM integration withlogits_processors
.Where does it fit in Outlines?
Allows easier integration into inference frameworks that people are mostly using, making
outlines
more useful to many people.Are you willing to open a PR?
I would be willing to implement this in a PR, yes. The implementation would be very similar to the current vLLM integration. If I am going to do this, I would need some guidance on preferred directory structures, however. The vLLM integration is inside the
serve
directory. Should vLLM and thistransformers
integration be moved into a separateintegrations
directory, perhaps?The text was updated successfully, but these errors were encountered: