Question about object queries #178

Ww-Lee · 2020-08-04T15:25:57Z

Hello, I have some questions about decoder layer.

Can I think of object query as adaptive anchor？In paper, it is essentially position embedding, but I don't understand how it works actually, especially in first decoder layer. Not taking self-attention into account, only object queries do cross-correlation with image features, right? So in encoder-decoder attention, what exactly does the pos embedding do?
In first decoder layer, what does self-attention do?
How can 100 object queries learn to specialize on some certain areas and box sizes, instead of depending on a certain class? But I think this problem will be solved if I understand your answer to my questions above.

Can you help me out? Thanks very much.

alcinos · 2020-08-05T16:13:28Z

Hi @Ww-Lee
Thank you for your interest in DETR.
You might be interested in our ECCV talk, specifically the section about object queries: https://youtu.be/utxbUlo9CyY?t=326

For your questions:

By definition, "anchors" are something you use to make relative predictions. By contrast, in DETR, all predictions are made absolutely, hence we can't really talk about anchors here. You can think of the object queries as slots that the model can use to make its predictions, and it turns out experimentally that it will tend to reuse a given slot to predict objects in a given area of the image. Note that we use the terminology "position embedding" that is borrowed from the NLP literature, but in the decoder there is nothing that is "positional" since everything is a set hence permutation equivariant.
The very first self-attention is useless. We experimentally verified that removing it does not change the performance. We left it to avoid complicating the code un-necessarily.
I believe this question is answered in the video.

Best of luck

Ww-Lee · 2020-08-06T02:19:38Z

Hi @Ww-Lee
Thank you for your interest in DETR.
You might be interested in our ECCV talk, specifically the section about object queries: https://youtu.be/utxbUlo9CyY?t=326

For your questions:

By definition, "anchors" are something you use to make relative predictions. By contrast, in DETR, all predictions are made absolutely, hence we can't really talk about anchors here. You can think of the object queries as slots that the model can use to make its predictions, and it turns out experimentally that it will tend to reuse a given slot to predict objects in a given area of the image. Note that we use the terminology "position embedding" that is borrowed from the NLP literature, but in the decoder there is nothing that is "positional" since everything is a set hence permutation equivariant.

The very first self-attention is useless. We experimentally verified that removing it does not change the performance. We left it to avoid complicating the code un-necessarily.

I believe this question is answered in the video.

Best of luck

Thank you. And could you please explain to me how encoder memory responds to object queries in encoder-decoder attention mechanism, so that trained model can predict objects in a given area of the image? I really can't imagine the process like semantic similarity in NLP.

jd730 · 2020-08-21T04:30:19Z

Hi @alcinos, according to the video you shared, Are the object queries specialized in spatial location rather than the class?

Have you ever analyzed which class information is learnt by each object query?

Thank you in advance.

fmassa · 2020-08-21T08:07:32Z

Hi @jd730

Are the object queries specialized in spatial location rather than the class?

The object queries seem to specialize spatially, and not per class.

Have you ever analyzed which class information is learnt by each object query?

We did some analysis, but we didn't see any clear specialization for the classes wrt object queries.

jd730 · 2020-08-21T08:21:36Z

Hi @fmassa,
Thank you for sharing your findings. It is really interesting that the queries are not specialized in classes. I thought that some of them are related to classes.
Maybe due to the transformer which conveys spatial information to the queries contributes the queries to spatialize spatially.

HawkRong · 2020-09-03T13:46:44Z

I have not read DETR's code yet, but the conference paper gives me the impression that 'object queries' are a fixed number of uniformly sampled locations on the spatial scale of the feature map before a decoder. Am I right?

fmassa · 2020-09-03T14:50:52Z

Hi @HawkRong ,

That's not correct.

I would suggest you have a look at Ross Girshick's CVPR tutorial on "Object Detection as a Machine Learning Problem", where he contrasts object queries from traditional approaches in detection.

The link with the exact timestamp is in here (but I would recommend watching the whole video) https://youtu.be/her4_rzx09o?t=1351

Additionally, @alcinos also posted an answer to a similar question in #178 (comment), with also another video that could help you understand it.

Let us know if you still have questions after checking those references.

HawkRong · 2020-09-04T09:19:48Z

@fmassa Thank you for sharing. I'm very interested. Could you provide the video files since youtube.com is not accessible from my country.

KaiserW · 2023-09-08T11:16:48Z

@fmassa Thank you for sharing. I'm very interested. Could you provide the video files since youtube.com is not accessible from my country.

https://share.weiyun.com/EgCBlIpD
Hi, since many years have past, I wish this is still helpful.

alcinos added the question Further information is requested label Aug 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about object queries #178

Question about object queries #178

Ww-Lee commented Aug 4, 2020

alcinos commented Aug 5, 2020

Uh oh!

Ww-Lee commented Aug 6, 2020

Uh oh!

jd730 commented Aug 21, 2020 •

edited

Loading

Uh oh!

fmassa commented Aug 21, 2020

Uh oh!

jd730 commented Aug 21, 2020

Uh oh!

HawkRong commented Sep 3, 2020

Uh oh!

fmassa commented Sep 3, 2020

Uh oh!

HawkRong commented Sep 4, 2020

Uh oh!

KaiserW commented Sep 8, 2023

Uh oh!

Question about object queries #178

Question about object queries #178

Comments

Ww-Lee commented Aug 4, 2020

alcinos commented Aug 5, 2020

Uh oh!

Ww-Lee commented Aug 6, 2020

Uh oh!

jd730 commented Aug 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa commented Aug 21, 2020

Uh oh!

jd730 commented Aug 21, 2020

Uh oh!

HawkRong commented Sep 3, 2020

Uh oh!

fmassa commented Sep 3, 2020

Uh oh!

HawkRong commented Sep 4, 2020

Uh oh!

KaiserW commented Sep 8, 2023

Uh oh!

jd730 commented Aug 21, 2020 •

edited

Loading