Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans to bring llava support? #340

Open
vshapenko opened this issue Dec 1, 2023 · 22 comments
Open

Any plans to bring llava support? #340

vshapenko opened this issue Dec 1, 2023 · 22 comments

Comments

@vshapenko
Copy link

Hi! Are there any plans to support llava as well? As i see, it is merged to llama.cpp about month ago and gives possibility to work with image recognition aswell.

@SignalRT
Copy link
Collaborator

SignalRT commented Dec 1, 2023

I have a prototype working. I will see if I can clean and finish the project this weekend.

@vshapenko
Copy link
Author

I have a prototype working. I will see if I can clean and finish the project this weekend.

Hello! Are there any news?

@philippjbauer
Copy link
Contributor

Looking forward to the change! Just discussed how this will help significantly to solve a use-case for our internal application. Do you have it available in your fork already? I'd like to pull it into mine and play around with it.

@vshapenko
Copy link
Author

vshapenko commented Dec 7, 2023 via email

@SignalRT
Copy link
Collaborator

SignalRT commented Dec 7, 2023

I will try to finish the library a little more this weekend before make this public. Until know, only tested on osx-arm64.

@philippjbauer
Copy link
Contributor

Cool, thank you! I'm on osx-arm64 and can do some testing (have my colleague do it perhaps too) after implementing it in our app.

@AshD
Copy link

AshD commented Jan 30, 2024

Is there any update on this? One Use case is OCR.

Thanks,
Ash

@SignalRT
Copy link
Collaborator

I have a branch in my fork with part of the changes. Build binaries, include runtime, etc. Those are things that I don't have done before. I first make the development with manually build binaries.

With the first versions of January, it worked.

With this prompt: What is unusual about this image? and this picture:

image

This is the output:

image

But since PR #445 it crashes on llama.cpp. I'm trying to identify the root cause and get this working again.

@AshD
Copy link

AshD commented Jan 30, 2024

Thanks for the update @SignalRT

Happy to test this when you have it working.

@dcostea
Copy link

dcostea commented Feb 6, 2024

@SignalRT
If there is any progress with this, I would be very glad to evaluate.

Btw, where is the code repo with the working branch you mentioned above?

@IntptrMax
Copy link

IntptrMax commented Feb 23, 2024

I have also tried to do it and have a very simple demo. I‘m trying to rewrite my code using llamasharp.

The demo is like this:
Prompt: describe the image in detail.
Input Image:
test

Output:
output

@SignalRT
Copy link
Collaborator

I will work this weekend to try to publish my work. The work will be in my branch: https://github.com/SignalRT/LLamaSharp/tree/MultiModal until PR.

@dcostea
Copy link

dcostea commented Feb 23, 2024

@SignalRT For now I switched to plan B, OllamaSharp, but I'm happy to hear that I will be able to switch back soon.
Good luck!

@zsogitbe
Copy link
Contributor

@SignalRT please put the code you have (LLava) into your branch that we can help you finalizing it (better calling it LLava instead of LLavaSharp).

@zsogitbe
Copy link
Contributor

zsogitbe commented Feb 27, 2024

@IntptrMax, I have looked at your example. It is a very good attempt! I have noticed a few bugs with marshaling the cpp output, for example,

public extern static llava_image_embed llava_image_embed_make_with_filename(IntPtr clip_ctx, int n_threads, string image);

should be changed to

public extern static IntPtr llava_image_embed_make_with_filename(IntPtr clip_ctx, int n_threads, string image);

because cpp returns a pointer to the structure and if you do not do this, then you will get some random problems...
You can marshal the IntPtr like this:

llava_image_embed image_embed = (llava_image_embed)Marshal.PtrToStructure(intptr_to_image_embed, typeof(llava_image_embed));

where intptr_to_image_embed is the output from llava_image_embed_make_with_filename.
Maybe look at all of your functions and make sure that the marshaling is OK everywhere (I did not check all).

Also, there is a problem with the context size. If the number of tokens in the image embedding is higher, then the context size (n_ctx), then the program will crash in the function llava_eval_image_embed. For example, the default 2048 will not work using my model with your example image because it has 2880 tokens! We need to adjust the contexts size based on the model.

@zsogitbe
Copy link
Contributor

@IntptrMax, I have quickly corrected your example and it seems that if we use llava_image_embed_make_with_filename with the above correction and a higher context size (4096), then it works:

Screenshot 2024-02-27 091013

@IntptrMax
Copy link

@zsogitbe Thanks a lot! I have get the same problem when evaluating several images, that's a good idea to solve it.

@zsogitbe
Copy link
Contributor

zsogitbe commented Feb 27, 2024

I need a minimum context size of = image embedding size (2880 tokens) + batch size (512). In your example 2880+512 = 3392! The image embedding size depends on the model!

@martindevans
Copy link
Member

martindevans commented Feb 28, 2024

PR with first draft: #555

@SignalRT
Copy link
Collaborator

SignalRT commented Mar 1, 2024

The first PR to build llava binaries #556

@IntptrMax
Copy link

I have tried to add llava to llamaSharp and it can work, but still have to improve. My demo is https://github.com/IntptrMax/LLamaSharp/tree/add_llava

@zsogitbe
Copy link
Contributor

zsogitbe commented Mar 5, 2024

It works IntptrMax, I have tested it, but there is a memory leak. In my trial 1.8GB GPU memory is not freed. Try to find how to free the GPU memory because 1.8 GB is too much.
Try to add some checks at the end that GPU memory is properly freed.
One of the extra releasing options is llama_grammar_free(ctx_sampling.grammar)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants