-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Linformer Addition #80
Comments
@AlexAndrei98 |
I've tried integrating the linear method for a custom dataset but the results were quite bad. Has anyone else tested this to report any comparable results? |
On coco, in a preliminary experiment, I managed to get results within 1-2 MAP from the baseline model. One of the keys is to use the same projection matrix across all attention layers. |
@alcinos Thank you very much for your thoughtful input. Do you know if this implementation works when I start my training from a given pre-trained model (from the ones that the authors provide) because I have tried that and it seems like the model is trying to train from scratch? I believe my data are quite a few (~8000 images for 3 classes) in order to achieve comparable results if I were to train from scratch. |
I don't think you can finetune a baseline model using linear attention. I'd suggest training a linformer model from scratch on coco then finetuning on your dataset |
@alcinos I've tried training from scratch on coco by so far I didnt notice and decrease in loss or increase in the class_error. In my understaning the linear transformer implementation expects every image to be in the same shape so I discarded the original detr transforms and I reshape all my images to 629x751 before passing them to the backbone, so the outputs of the backbone are of final shape 20x24. Then i modify transformer.py file in order to incorporate the LinearMultiheadAttention module as such (as instructed):
Could you please provide any specific insight on the above? Thanks in advance. |
I have not experimented very much, so it's hard to give exact feedback but I think it's better to use only one projection matrix shared for all layers. |
@EmmanouelP were you able to make the Linformer or other linear transformer to work with DETR? |
Linformer addition thoughts 🤷🏻♂️
Hey I just noticed that a linear transform was just created! I think it would be interesting to add it in DETR since we are doing image detection our sequences will naturally be longer! If anyone would be interested in adding it in their architecture I would be grateful since I am not quite experienced 😅in pytorch!
https://github.com/tatp22/linformer-pytorch/blob/master/README.md
The text was updated successfully, but these errors were encountered: