Support for external taggers #1062

LameMonster82 · 2023-12-27T01:07:18Z

Describe the feature you'd like to request

Given how limiting the built-in AI models can be it would be nice if it was possible to add external taggers that could be used instead of the built-in ones.

Describe the solution you'd like

A possible solution is using web requests to communicate with external web servers where the image can be sent over and the server can respond with tags that match the image. This will allow to use different models that aren't implemented in the recognize app itself and also stronger models that may not work in docker setups or the host hardware.

For example the https://github.com/AUTOMATIC1111/stable-diffusion-webui project hosts a web server that can generate images locally. Using the extension https://github.com/picobyte/stable-diffusion-webui-wd14-tagger and --api option, a user can send a POST request with any image encoded in base64 and get a json response of all tags in it including weights on how much the image matches set tags.

One the external setup is complete the process is simple:

Convert the image to base64 data
Craft it into a javascript object. (eg. {"image": "data:image/<type>;base64,<image data>", "model": "<model name>", "threshold": 0.35})
Create a POST request to a target link (eg. "http://127.0.0.1:7860/tagger/v1/interrogate")
Get a response like:

{
    "caption": {
        "tag": {
            "tag1": 0.6852931380271912,
            "tag2": 0.6729011535644531,
            "tag3": 0.46537110209465027,
            "tag4": 0.5044106245040894
        }
    }
}

I don't know any other image tagging software that can be interacted using web requests but I think it's a nice start. Even if its only for object tagging.
I wish I could provide a pull request with this option but unfortunately I'm not skilled enough to write clean PHP code with the nextcloud API.

Describe alternatives you've considered

Implementing new models into the app itself is time consuming and slow. And making them work with the limited dependencies of the nextcloud environment and docker instances is even harder.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-12-27T01:07:38Z

Hello 👋

Thank you for taking the time to open this issue with recognize. I know it's frustrating when software
causes problems. You have made the right choice to come here and open an issue to make sure your problem gets looked at
and if possible solved.
I try to answer all issues and if possible fix all bugs here, but it sometimes takes a while until I get to it.
Until then, please be patient.
Note also that GitHub is a place where people meet to make software better together. Nobody here is under any obligation
to help you, solve your problems or deliver on any expectations or demands you may have, but if enough people come together we can
collaborate to make this software better. For everyone.
Thus, if you can, you could also look at other issues to see whether you can help other people with your knowledge
and experience. If you have coding experience it would also be awesome if you could step up to dive into the code and
try to fix the odd bug yourself. Everyone will be thankful for extra helping hands!
One last word: If you feel, at any point, like you need to vent, this is not the place for it; you can go to the forum,
to twitter or somewhere else. But this is a technical issue tracker, so please make sure to
focus on the tech and keep your opinions to yourself. (Also see our Code of Conduct. Really.)

I look forward to working with you on this issue
Cheers 💙

PhilLab · 2024-01-20T17:06:23Z

For me, it would also be helpful to have an "incomming" API, that is setting the app's face and cluster information via a Rest API.
I would use that to run my own models on my local desktop machine, running on the files which the Nextcloud sync tool has downloaded, and uploading only the face recognition result to the nextcloud.

This way, I can use my GPU to speed up the process, I can run whichever recognition stack I need and I can work with a nextcloud instance which is hosted by a thirdparty provider (who blocked the face recognition calls on their system to not impact the other tenants).

WDYT, @marcelklehr?

LameMonster82 added the enhancement New feature or request label Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for external taggers #1062

Support for external taggers #1062

LameMonster82 commented Dec 27, 2023 •

edited

Loading

github-actions bot commented Dec 27, 2023

PhilLab commented Jan 20, 2024

Support for external taggers #1062

Support for external taggers #1062

Comments

LameMonster82 commented Dec 27, 2023 • edited Loading

Describe the feature you'd like to request

Describe the solution you'd like

Describe alternatives you've considered

github-actions bot commented Dec 27, 2023

PhilLab commented Jan 20, 2024

LameMonster82 commented Dec 27, 2023 •

edited

Loading