Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Adding " Clip Interrogator " image to prompts in fooocus #3012

Open
1 task done
badraymen opened this issue May 26, 2024 · 4 comments
Open
1 task done
Labels
enhancement New feature or request

Comments

@badraymen
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do?

after my experience with the fooocus "describe" tool, I found that there are missing sentences and missing words in the creation of the prompts and the sentences are too short and they are not really targeted, however I found an alternative, I looked on the internet for websites that generate prompts from an image and that was my problem because honestly I use the all prompt image option too much to create funds for my photo and I found "clip interrogation" which is an extension intended for SD XL and I tested it on "collab" and it gave magnificent results the words are well targeted with the name of the photographers and/or even the style name, it manages to recognize the brands they sometimes manage to write correctly the name of the brands on the products that I use for handling, I found that it is really practical I used it on fooocus and This gave truly incredible results; there is a great resemblance between the image that I would like to generate and the original image.
so it will be really kind of you to add this functionality to focus in the form of a tab to create prompts and switch them directly into the text field for generation
Link :

https://github.com/pharmapsychotic/clip-interrogator

best regards

Proposed workflow

  1. Go to "Input Image"
  2. Go to "describe"
  3. Choose Model expl: Vit-L/Openai
  4. choose fast or best
  5. put your image to describe it
  6. press generate prompt

Additional information

No response

@badraymen badraymen added enhancement New feature or request triage This needs an (initial) review labels May 26, 2024
@mashb1t
Copy link
Collaborator

mashb1t commented May 31, 2024

@badraymen fyi I'm on it and currently testing various image captioning models in a separate project: https://github.com/mashb1t/describeiments

The intermediate result is that BLIP (1) (+ BERT) is the one with the best integration into Fooocus and lowest resource allocation, not sure if worth the switch + effort.

modified code of https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator/blob/main/app.py can be found in
interrogator.py.txt

image
One can also really overshoot in terms of VRAM with the combination of ViT-H and BLIP 2.

@mashb1t mashb1t removed the triage This needs an (initial) review label May 31, 2024
@badraymen
Copy link
Author

Thank you so much @mashb1t For your interest and your involvement, but can I have a speech understandable for a person who really knows nothing in the language of coding and paython, an explanation to simplify I would be really kind of you. So are you going to integrate clip interrogator, or are you going to develop a new function in fooocus for the next version?

@mashb1t
Copy link
Collaborator

mashb1t commented May 31, 2024

@badraymen sure: no clip-interrogator until it has been fully evaluated and benchmarked.
(also it's based on transformers, which Fooocus doesn't use)

@badraymen
Copy link
Author

You are really kind dear sir, thank you again for this clarification, good luck in what you do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants