Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About prompts vectors usage #82

Open
VilisovEvgeny opened this issue Aug 2, 2024 · 1 comment
Open

About prompts vectors usage #82

VilisovEvgeny opened this issue Aug 2, 2024 · 1 comment

Comments

@VilisovEvgeny
Copy link

Did I understand correctly that the same prompt vector is used for both classification and selection of the corresponding Q input boxes embeddings?

If so, then it turns out that the Qdec decoder does not change much, but is simply refinished?

(I am familiar with the operating principle of DETR, but nevertheless this point is not completely clear to me)

Thanks!

@Mountchicken
Copy link
Collaborator

Hi @VilisovEvgeny

Visual prompt embedding is indeed used for both final classification and decoder query selection. However, the input to the decoder is not the visual prompt embedding. In query selection, we calculate the similarity between the visual prompt and each pixel in the encoder's output, and then we select the top N (N=900) pixels. At the positions of these selected pixels, some predefined anchors of fixed sizes are placed. These anchors are used to initialize the decoder's position embedding, but the content embedding of the decoder consists of N learnable vectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants