Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vision #249

Merged
merged 16 commits into from
Nov 22, 2024
Merged

Vision #249

merged 16 commits into from
Nov 22, 2024

Conversation

bdashore3
Copy link
Member

Bypass template:

Adds support for vision models using Exl2

Contributors: @bdashore3 @DocShotgun

bdashore3 and others added 15 commits November 11, 2024 12:10
HuggingFace separated the chat template in the newest transformers
versions.

Signed-off-by: kingbri <[email protected]>
Adds the ability to load vision parts of text + image models. Requires
an explicit flag in config because there isn't a way to automatically
determine whether the vision tower should be used.

Signed-off-by: kingbri <[email protected]>
* Support image_url inputs containing URLs or base64 strings following OAI vision spec
* Use async lru cache for image embeddings
* Add generic wrapper class for multimodal embeddings
* More robust checks for OAI chat completion message lists on /v1/encode endpoint
* Added TODO to support other aspects of chat completions
* Fix oversight where embeddings was not defined in advance on /v1/chat/completions endpoint
* When vision is not enabled, only the first text block is kept in message.content if it is a list
Previously, the messages were a list of dicts. These are untyped
and don't provide strict hinting. Add types for chat completion
messages and reformat existing code.

Signed-off-by: kingbri <[email protected]>
Migrate the add method into the class itself. Also, a BaseModel isn't
needed here since this isn't a serialized class.

Signed-off-by: kingbri <[email protected]>
Previously, the flow for parsing chat completion messages and rendering
from the prompt template was disconnected between endpoints. Now, create
a common function to render and handle everything appropriately afterwards.

Signed-off-by: kingbri <[email protected]>
Mistake in unwrapping. Vision should be false to allow normal model
loading when the flag isn't provided.

Signed-off-by: kingbri <[email protected]>
The model_type internal reference was changed to an enum for
a more extendable loading process. Return the current model type
when loading a new model.

Signed-off-by: kingbri <[email protected]>
If vision is enabled and the model doesn't support it, send an
error asking the user to reload. Also, add a method to unload the
vision tower.

Signed-off-by: kingbri <[email protected]>
@bdashore3 bdashore3 requested a review from DocShotgun November 22, 2024 19:27
The strings weren't being concatenated properly. Only add the combined
text if the chat completion type is a List.

Signed-off-by: kingbri <[email protected]>
@bdashore3 bdashore3 merged commit 9c8186c into main Nov 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants