Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed box before multihead attention #2

Open
luo3300612 opened this issue Mar 2, 2020 · 1 comment
Open

Embed box before multihead attention #2

luo3300612 opened this issue Mar 2, 2020 · 1 comment

Comments

@luo3300612
Copy link

luo3300612 commented Mar 2, 2020

Thank you for your idea and repo. Since box embedding and w_g stay same in multi-turn multihead attention and they do not rely on k,q,v. Is it proper to move box embedding process to the begining of multihead attention to avoid embedding box in each EncoderLayer again and again? I have tried this and found it can reduce XE training time from 22h to 18h(on GTX 1080Ti) without obvious performance degradation (from CIDEr 1.1495 to CIDEr 1.1485)

@simaoh
Copy link
Collaborator

simaoh commented Jan 7, 2021

@luo3300612 Thanks for your observations.

Equations (6) and (7) in the paper, show that indeed the box_embedding Emb(\lambda) is just a function of the bounding box displacements, and therefore constant for all the self-attention layers of the transformer encoder.
Therefore, as you say, the computation of Emb(\lambda) can be moved out of the self-attention layer.

However, as you can see in equation (7), the geometric weights w_g are a function of a learnable weight matrix W_G.
This learnable matrices are allowed to be different for different self-attention layers.
Therefore, the computation of w_g cannot be moved out of the self-attention layer.

Here is the computation of w_g in our code (Notice the linear layer l()):

relative_geometry_weights_per_head = [l(flatten_relative_geometry_embeddings).view(box_size_per_head) for l in self.WGs]

Does this answer your question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants