Make mlx-vlm examples in swift #132

davidkoski · 2024-09-27T17:46:48Z

Consider porting some models from https://github.com/Blaizzy/mlx-vlm to swift

davidkoski · 2024-09-27T17:59:57Z

e.g.

LLaVa llava-hf/LLaVA-NeXT-Video-7B-hf
Qwen2 VL: Qwen/Qwen2-VL-2B-Instruct
Llama 3.2 Vision: meta-llama/Llama-3.2-11B-Vision-Instruct
Phi-3 Vision microsoft/Phi-3-vision-128k-instruct
PaliGemma google/paligemma-3b-mix-224

mzbac · 2024-09-30T00:30:33Z

Currently, I am working on porting Llama 3.2 VLM to Swift. It would be great if we could make the vlm a separate package so that people can easily pull it down as a dependency and integrate it into their applications, for example, add vlm support for ChatMLX.

DePasqualeOrg · 2024-11-01T10:26:41Z

If someone can put together the basic pipeline for one vision model, I can probably port the others to Swift fairly quickly.

davidkoski · 2024-11-01T15:06:45Z

I am working on it right now and have paligemma done (well, not debugged but callable). I am working on how to structure the code with regard to the LLM library -- they should share code where possible.

I will try and put up the branch with what I have today. Next week will be busy so it might be two weeks from now before it is really ready.

DePasqualeOrg · 2024-11-01T15:09:38Z

Fantastic, thank you! Once that's in place, I'll start working on some of the other models (and will post here first to avoid duplication of work).

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

davidkoski · 2024-11-01T23:06:54Z

OK, you can see what I have -- more work to be done but the eval loop is worked out.

#151

davidkoski · 2024-11-13T22:38:30Z

This continues -- I have most of the refactoring done and llm-tool has a hard coded call to paligemma. I need to implement a second VLM (qwen2_vl) so I can make sure I have the right shape for the APIs.

As mentioned before this will be a breaking change in the API (so I will do a major version bump) but it should be pretty easy to adopt. Hopefully a new import and renaming a couple things: I will produce a guide when it is ready.

DePasqualeOrg · 2024-11-13T23:07:46Z

Thanks @davidkoski, your work is much appreciated! Once the API is stable, I'll try to port some of the other VLMs.

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

anishjain123 · 2024-11-21T21:18:02Z

@davidkoski @DePasqualeOrg did either of you get qwen 2 vl working in swift?

davidkoski · 2024-11-21T22:37:28Z

It is implemented in the branch right now but still lacks the image processor -- that is what I am starting on next.

anishjain123 · 2024-11-22T13:53:09Z

you are doing god's work @davidkoski ! If you need help lmk! Also do you know what would be necessary to go from image processing to video processing?

anishjain123 · 2024-11-22T15:34:13Z

@davidkoski Blaizzy/mlx-vlm#97 here is a PR from mlx-vlm that might help!

davidkoski · 2024-11-22T16:18:17Z

Yes, this first version won't have it but it should be straightforward to add. Qwen2VL treats an array of images and a video roughly the same but handles them slightly different in the processor. The video ends up with a different value in the t value (temporal? time?) when it constructs the thw array.

anishjain123 · 2024-11-27T15:18:58Z

yes youre right about the array of image handling! I tried out a rough version of Qwen2VL and the memory usage on any reasonably sized video is absurd!

Seems like this might not be the architecture to support practical on device video processing...

btw, @davidkoski is there a way to set up a LLM api on MLX as is done with llama.cpp or tools like LM Studio? I have done this with llama.cpp but want to have the performance boost of MLX to see whats possible :)

Thanks again for all your great work I know you have been really involved with MLX from the start!

davidkoski · 2024-11-28T15:36:53Z

btw, @davidkoski is there a way to set up a LLM api on MLX as is done with llama.cpp or tools like LM Studio? I have done this with llama.cpp but want to have the performance boost of MLX to see whats possible :)

I am not sure what kind of API you mean -- certainly there is an API for preparing a prompt and generating tokens, but I think you mean something different.

Probably the answer is yes, but it might be something you would have to build, e.g. if you wanted a web service.

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

kunal732 · 2024-12-05T04:28:44Z

Thanks for your work on the vlm branch @davidkoski . Using llm-tool i can get paligemma to work with the following flag: --model mlx-community/paligemma-3b-mix-448-8bit

but i can't get qwen2_vl to work using --model mlx-community/Qwen2-VL-2B-Instruct-4bit

any assistance ?

davidkoski · 2024-12-05T05:13:15Z

Thanks for your work on the vlm branch @davidkoski . Using llm-tool i can get paligemma to work with the following flag: --model mlx-community/paligemma-3b-mix-448-8bit

but i can't get qwen2_vl to work using --model mlx-community/Qwen2-VL-2B-Instruct-4bit

any assistance ?

That is the version of the model I was using. What error do you see, or what output? I was using the prompt "describe the image in English" (because it often output Chinese text and this seemed pretty reliable in getting it to output English).

kunal732 · 2024-12-05T14:32:12Z

Here are the flags im using:
vlm --model mlx-community/Qwen2-VL-2B-Instruct-4bit --prompt "describe image in english" --image /Users/pathtoimage/image.png

get this output when trying to use that model:

LXNN/Module.swift:515: Fatal error: 'try!' expression unexpectedly raised an error: MLXNN.UpdateError.unableToCollectModulesFromContainer(base: "PatchMerger", key: "mlp")

@davidkoski

davidkoski · 2024-12-05T15:43:27Z

Here are the flags im using: vlm --model mlx-community/Qwen2-VL-2B-Instruct-4bit --prompt "describe image in english" --image /Users/pathtoimage/image.png

get this output when trying to use that model:

LXNN/Module.swift:515: Fatal error: 'try!' expression unexpectedly raised an error: MLXNN.UpdateError.unableToCollectModulesFromContainer(base: "PatchMerger", key: "mlp")

@davidkoski

Ah, that looks like: ml-explore/mlx-swift#164

Make sure that your mlx-swift is using the 0.21.0 (or higher) tag. I wonder if you still have 0.18?

kunal732 · 2024-12-05T17:49:06Z

That was the issue! Thank you - it's working great!!

davidkoski · 2024-12-10T19:03:42Z

Closing this -- we have two models (qwen2-vl and paligemma). More can be added over time.

DePasqualeOrg mentioned this issue Nov 1, 2024

Any resources on running vision llms in Swift ? DePasqualeOrg/mlx-intro#1

Open

davidkoski added a commit that referenced this issue Nov 1, 2024

initial commit of vlm

d68a193

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

davidkoski added a commit that referenced this issue Nov 1, 2024

initial commit of vlm

65fed94

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

davidkoski mentioned this issue Nov 1, 2024

add VLM support, refactor common LM code into MLXLMCommon. breaking API changes #151

Merged

davidkoski added a commit that referenced this issue Nov 18, 2024

initial commit of vlm

3230887

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

davidkoski added a commit that referenced this issue Nov 19, 2024

initial commit of vlm

bc7d887

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

davidkoski added a commit that referenced this issue Dec 2, 2024

initial commit of vlm

12f84f5

- based on models from https://github.com/Blaizzy/mlx-vlm - for #132

davidkoski closed this as completed Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make mlx-vlm examples in swift #132

Make mlx-vlm examples in swift #132

davidkoski commented Sep 27, 2024

davidkoski commented Sep 27, 2024

mzbac commented Sep 30, 2024

DePasqualeOrg commented Nov 1, 2024

davidkoski commented Nov 1, 2024

DePasqualeOrg commented Nov 1, 2024

davidkoski commented Nov 1, 2024 •

edited

Loading

davidkoski commented Nov 13, 2024

DePasqualeOrg commented Nov 13, 2024

anishjain123 commented Nov 21, 2024

davidkoski commented Nov 21, 2024

anishjain123 commented Nov 22, 2024

anishjain123 commented Nov 22, 2024

davidkoski commented Nov 22, 2024

anishjain123 commented Nov 27, 2024

davidkoski commented Nov 28, 2024

kunal732 commented Dec 5, 2024

davidkoski commented Dec 5, 2024

kunal732 commented Dec 5, 2024 •

edited

Loading

davidkoski commented Dec 5, 2024

kunal732 commented Dec 5, 2024

davidkoski commented Dec 10, 2024

Make mlx-vlm examples in swift #132

Make mlx-vlm examples in swift #132

Comments

davidkoski commented Sep 27, 2024

davidkoski commented Sep 27, 2024

mzbac commented Sep 30, 2024

DePasqualeOrg commented Nov 1, 2024

davidkoski commented Nov 1, 2024

DePasqualeOrg commented Nov 1, 2024

davidkoski commented Nov 1, 2024 • edited Loading

davidkoski commented Nov 13, 2024

DePasqualeOrg commented Nov 13, 2024

anishjain123 commented Nov 21, 2024

davidkoski commented Nov 21, 2024

anishjain123 commented Nov 22, 2024

anishjain123 commented Nov 22, 2024

davidkoski commented Nov 22, 2024

anishjain123 commented Nov 27, 2024

davidkoski commented Nov 28, 2024

kunal732 commented Dec 5, 2024

davidkoski commented Dec 5, 2024

kunal732 commented Dec 5, 2024 • edited Loading

davidkoski commented Dec 5, 2024

kunal732 commented Dec 5, 2024

davidkoski commented Dec 10, 2024

davidkoski commented Nov 1, 2024 •

edited

Loading

kunal732 commented Dec 5, 2024 •

edited

Loading