-
Notifications
You must be signed in to change notification settings - Fork 13.4k
mtmd: add mtmd_get_vision_image_size() and mtmd_get_vision_patch_size() functions #16705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…() functions Signed-off-by: deadprogram <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, I don't think exposing these data is a good idea.
The API for getting audio bit rate is required because there is no resampling logic inside mtmd. Providing input with wrong bit rate will make the model to malfunction.
On the other hand, image size and patch size are not public API as mtmd can work with any input image sizes. The image will be resized internally.
Unless you can show the code where you use this, I don't think we should add these API. Maybe what you're doing is already possible via mtmd_image_tokens_get_n_tokens
| // get vision image size in pixels, for example 1024 | ||
| // return -1 if vision is not supported | ||
| MTMD_API int mtmd_get_vision_image_size(mtmd_context * ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a good idea to add this, as many models now support dynamic resolution and this won't give the correct size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, some models are fixed res, some are naflex.
The presumption is that if you want to do your own pre-processing, you will need to be familiar with what the model you are using expects.
Actually my intention is this would be the first step towards being able to bypass the Allowing for skipping this step allows for things like hardware acceleration of image pre-processing without having to actually do any such implementation inside of |
|
I don't want to support such use case as it will make mtmd API as bloated as clip API. The preprocessing can vary vastly among model (i.e. some needs to align to specific ratios, some needs to pad with specific color). If the user have to make a presumption before using an API, then what's the point of being a public API? Overall, I won't merge this change as it's only specific to your use case. You can instead maintain your own fork, which gives you more control over the pipeline. |
Sorry if I was unclear here. What I meant to say was that if the user would choose to bypass the current |
|
In any case, since this change is not acceptable to the maintainers, I will rethink my approach. Thanks for the feedback. Now closing. |
This PR adds
mtmd_get_vision_image_size()andmtmd_get_vision_patch_size()functions as mentioned in #16703