Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

Linformer Addition #80

Open
AlexAndrei98 opened this issue Jun 12, 2020 · 8 comments
Open

Linformer Addition #80

AlexAndrei98 opened this issue Jun 12, 2020 · 8 comments

Comments

@AlexAndrei98
Copy link

Linformer addition thoughts 🤷🏻‍♂️

Hey I just noticed that a linear transform was just created! I think it would be interesting to add it in DETR since we are doing image detection our sequences will naturally be longer! If anyone would be interested in adding it in their architecture I would be grateful since I am not quite experienced 😅in pytorch!

https://github.com/tatp22/linformer-pytorch/blob/master/README.md

@kuixu
Copy link

kuixu commented Jun 22, 2020

@AlexAndrei98
Here is a practice implementation of adding Linformer into DETR by replacing nn.MultiheadAttention with Linformer's LinearMultiheadAttention.

https://github.com/kuixu/Linear-Multihead-Attention

@EmmanouelP
Copy link

I've tried integrating the linear method for a custom dataset but the results were quite bad. Has anyone else tested this to report any comparable results?

@alcinos
Copy link
Contributor

alcinos commented Oct 16, 2020

On coco, in a preliminary experiment, I managed to get results within 1-2 MAP from the baseline model. One of the keys is to use the same projection matrix across all attention layers.
Best of luck

@EmmanouelP
Copy link

@alcinos Thank you very much for your thoughtful input. Do you know if this implementation works when I start my training from a given pre-trained model (from the ones that the authors provide) because I have tried that and it seems like the model is trying to train from scratch? I believe my data are quite a few (~8000 images for 3 classes) in order to achieve comparable results if I were to train from scratch.

@alcinos
Copy link
Contributor

alcinos commented Oct 16, 2020

I don't think you can finetune a baseline model using linear attention. I'd suggest training a linformer model from scratch on coco then finetuning on your dataset

@EmmanouelP
Copy link

@alcinos I've tried training from scratch on coco by so far I didnt notice and decrease in loss or increase in the class_error. In my understaning the linear transformer implementation expects every image to be in the same shape so I discarded the original detr transforms and I reshape all my images to 629x751 before passing them to the backbone, so the outputs of the backbone are of final shape 20x24.

Then i modify transformer.py file in order to incorporate the LinearMultiheadAttention module as such (as instructed):

#self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
self.self_attn = LinearMultiheadAttention(d_model, nhead, dropout=dropout, seq_len=20*24, proj_k=128, param_sharing='layerwise') # where w, h are from `bs, c, h, w = src.shape`
#self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
#self.multihead_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
self.self_attn = LinearMultiheadAttention(d_model, nhead, dropout=dropout, seq_len=100, proj_k=128, param_sharing='layerwise') # where num_queries = args.num_queries
self.multihead_attn = LinearMultiheadAttention(d_model, nhead, dropout=dropout, seq_len=20*24, proj_k=128, param_sharing='layerwise') # where w, h are from `bs, c, h, w = src.shape`

Could you please provide any specific insight on the above? Thanks in advance.

@alcinos
Copy link
Contributor

alcinos commented Oct 29, 2020

I have not experimented very much, so it's hard to give exact feedback but I think it's better to use only one projection matrix shared for all layers.
As for padding, you could in theory use the same transforms as we have currently. You just need to define your projection with dimension being the length of the largest sequence you may encounter, and if you get a smaller sequence you simply narrow the matrix. I honestly don't know what is the best solution. I would expect DETR to be more robust if you train it with varying sizes, but I don't have hard data to back this up.

@nicolasugrinovic
Copy link

@EmmanouelP were you able to make the Linformer or other linear transformer to work with DETR?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants