About prompts vectors usage #82

VilisovEvgeny · 2024-08-02T06:08:01Z

Did I understand correctly that the same prompt vector is used for both classification and selection of the corresponding Q input boxes embeddings?

If so, then it turns out that the Qdec decoder does not change much, but is simply refinished?

(I am familiar with the operating principle of DETR, but nevertheless this point is not completely clear to me)

Thanks!

Mountchicken · 2024-08-02T06:54:44Z

Hi @VilisovEvgeny

Visual prompt embedding is indeed used for both final classification and decoder query selection. However, the input to the decoder is not the visual prompt embedding. In query selection, we calculate the similarity between the visual prompt and each pixel in the encoder's output, and then we select the top N (N=900) pixels. At the positions of these selected pixels, some predefined anchors of fixed sizes are placed. These anchors are used to initialize the decoder's position embedding, but the content embedding of the decoder consists of N learnable vectors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About prompts vectors usage #82

About prompts vectors usage #82

VilisovEvgeny commented Aug 2, 2024

Mountchicken commented Aug 2, 2024

About prompts vectors usage #82

About prompts vectors usage #82

Comments

VilisovEvgeny commented Aug 2, 2024

Mountchicken commented Aug 2, 2024