mtmd: add mtmd_get_vision_image_size() and mtmd_get_vision_patch_size() functions #16705

deadprogram · 2025-10-21T16:38:59Z

This PR adds mtmd_get_vision_image_size() and mtmd_get_vision_patch_size() functions as mentioned in #16703

…() functions Signed-off-by: deadprogram <[email protected]>

ngxson

In most cases, I don't think exposing these data is a good idea.

The API for getting audio bit rate is required because there is no resampling logic inside mtmd. Providing input with wrong bit rate will make the model to malfunction.

On the other hand, image size and patch size are not public API as mtmd can work with any input image sizes. The image will be resized internally.

Unless you can show the code where you use this, I don't think we should add these API. Maybe what you're doing is already possible via mtmd_image_tokens_get_n_tokens

ngxson · 2025-10-22T09:38:49Z

tools/mtmd/mtmd.h

+// get vision image size in pixels, for example 1024
+// return -1 if vision is not supported
+MTMD_API int mtmd_get_vision_image_size(mtmd_context * ctx);


Not a good idea to add this, as many models now support dynamic resolution and this won't give the correct size

Indeed, some models are fixed res, some are naflex.

The presumption is that if you want to do your own pre-processing, you will need to be familiar with what the model you are using expects.

deadprogram · 2025-10-22T09:51:42Z

In most cases, I don't think exposing these data is a good idea.

The API for getting audio bit rate is required because there is no resampling logic inside mtmd. Providing input with wrong bit rate will make the model to malfunction.

On the other hand, image size and patch size are not public API as mtmd can work with any input image sizes. The image will be resized internally.

Unless you can show the code where you use this, I don't think we should add these API. Maybe what you're doing is already possible via mtmd_image_tokens_get_n_tokens

Actually my intention is this would be the first step towards being able to bypass the clip_image_preprocess function via a configuration param. If you already have an image processing pipeline, it would be great to be able to not double process the same image data twice.

Allowing for skipping this step allows for things like hardware acceleration of image pre-processing without having to actually do any such implementation inside of mtmd itself.

ngxson · 2025-10-22T10:24:26Z

I don't want to support such use case as it will make mtmd API as bloated as clip API. The preprocessing can vary vastly among model (i.e. some needs to align to specific ratios, some needs to pad with specific color). If the user have to make a presumption before using an API, then what's the point of being a public API?

Overall, I won't merge this change as it's only specific to your use case. You can instead maintain your own fork, which gives you more control over the pipeline.

deadprogram · 2025-10-22T10:51:49Z

If the user have to make a presumption before using an API, then what's the point of being a public API?

Sorry if I was unclear here. What I meant to say was that if the user would choose to bypass the current clip_image_preprocess function, they would have to understand how to process the image themselves. Not for all users.

deadprogram · 2025-10-23T12:24:43Z

In any case, since this change is not acceptable to the maintainers, I will rethink my approach. Thanks for the feedback. Now closing.

mtmd: add mtmd_get_vision_image_size() and mtmd_get_vision_patch_size…

4ec6eb4

…() functions Signed-off-by: deadprogram <[email protected]>

deadprogram requested a review from ngxson as a code owner October 21, 2025 16:38

github-actions bot added the examples label Oct 21, 2025

ngxson requested changes Oct 22, 2025

View reviewed changes

deadprogram closed this Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd: add mtmd_get_vision_image_size() and mtmd_get_vision_patch_size() functions #16705

mtmd: add mtmd_get_vision_image_size() and mtmd_get_vision_patch_size() functions #16705

Uh oh!

deadprogram commented Oct 21, 2025

Uh oh!

ngxson left a comment

Uh oh!

ngxson Oct 22, 2025

Uh oh!

deadprogram Oct 22, 2025

Uh oh!

deadprogram commented Oct 22, 2025

Uh oh!

ngxson commented Oct 22, 2025 •

edited

Loading

Uh oh!

deadprogram commented Oct 22, 2025

Uh oh!

deadprogram commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mtmd: add mtmd_get_vision_image_size() and mtmd_get_vision_patch_size() functions #16705

mtmd: add mtmd_get_vision_image_size() and mtmd_get_vision_patch_size() functions #16705

Uh oh!

Conversation

deadprogram commented Oct 21, 2025

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

deadprogram Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

deadprogram commented Oct 22, 2025

Uh oh!

ngxson commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deadprogram commented Oct 22, 2025

Uh oh!

deadprogram commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Oct 22, 2025 •

edited

Loading