About Visual Prompt Encoder with negative visual prompts #91

feivellau · 2024-12-06T08:54:50Z

Hi, thanks for the great job! I also have a question related to this paper.
The paper mentions that negative example prompts can be utilized. Could you please explain how negative example prompts are implemented in the training strategy?

Mountchicken · 2024-12-06T11:07:49Z

In training, negative prompts are used to contrast categories and help the model learn better separations. For example, in a mini-batch with a batch size of 2, you may have categories A and B in Figure 1, and C and D in Figure 2. The embeddings for all these categories (A, B, C, and D) are gathered during training. When computing the classification loss, positive prompts are pairs from the same category, while negative prompts are from different categories. For example, B, C, and D are negative prompts to category A; C, D, and A are negative prompts to category B.

feivellau · 2024-12-08T14:06:10Z

What I understand is that during the training process, except for the current category, all others are negative examples?
If so, there is also a question in inference, how to filter targets by negative prompts？

Mountchicken · 2024-12-09T08:49:53Z

In inference, using negative prompts is essentially a multi-class classification problem. For example, if you're detecting red apples, but the image also contains green apples, providing only visual prompts for red apples might cause the model to detect the green apples as well. In this case, you can provide both visual prompts for red and green apples, transforming the problem into a binary classification task. Then, you can select the bounding box with the detection result as red apples at the end.

feivellau · 2024-12-10T02:45:41Z

I understand. I initially thought it was a similar implementation to SAM, which involved multiple iterations of prompts and negative prompts.
Can it be understood this way: by calculating the similarity between the output of the classification encoding and the prompt encoding, we can determine which prompt the current instance belongs to, thereby achieving the goal of multi-class detection or filtering?

Mountchicken · 2024-12-10T02:47:21Z

Yes, this is the way.

feivellau · 2024-12-24T07:48:37Z

I have another question. When making predictions, if the target object is not present in the image, is it filtered based on the similarity threshold?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Visual Prompt Encoder with negative visual prompts #91

About Visual Prompt Encoder with negative visual prompts #91

feivellau commented Dec 6, 2024

Mountchicken commented Dec 6, 2024

feivellau commented Dec 8, 2024 •

edited

Loading

Mountchicken commented Dec 9, 2024

feivellau commented Dec 10, 2024

Mountchicken commented Dec 10, 2024

feivellau commented Dec 24, 2024

About Visual Prompt Encoder with negative visual prompts #91

About Visual Prompt Encoder with negative visual prompts #91

Comments

feivellau commented Dec 6, 2024

Mountchicken commented Dec 6, 2024

feivellau commented Dec 8, 2024 • edited Loading

Mountchicken commented Dec 9, 2024

feivellau commented Dec 10, 2024

Mountchicken commented Dec 10, 2024

feivellau commented Dec 24, 2024

feivellau commented Dec 8, 2024 •

edited

Loading