Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

Network does not converge from scratch #169

Open
ririya opened this issue Jul 30, 2020 · 11 comments
Open

Network does not converge from scratch #169

ririya opened this issue Jul 30, 2020 · 11 comments

Comments

@ririya
Copy link

ririya commented Jul 30, 2020

I've been trying to train the network from scratch using a custom dataset of around 100K images and 8 classes. usually the network trains until it reaches a training loss of ~23 and then gets stuck there no matter how many epochs run.

The only thing that works is transfer learning from the trained coco models provided and replacing the transformer with a new number of classes and queries (I've been using 20 queries).

I actually got decent results doing the above scheme, but the model is still outperformed by other models such as EfficientDet. So my next step is trying to replace the backbone with an EfficientNet architecture.

The problem is not EfficientNet itself since i am having convergence issues training from scratch even with the original backbones. But I do believe some backbones make it harder to converge.

Here are a couple things I tried:

  • Importing the transformer part form the pretrained coco models and replacing the backbone, keeping the query and class layers as 100 and 91 and also replacing only those layers with 20 / 8 layers.
  • Changing the optimizer (Adam, AdamW, RmsProp)
  • Changing the learning rate from 10e-3 to 10e-6
  • Changing batch size (This one worked for smaller backbones such as Resnet50)
  • Using normal batch norm layers instead of frozen batch norm
  • Changing the image size. The original are 1280 x720 I tried half and quarter size images. I noticed that larger images also make it harder to converge.

I also made a few modifications to the code:

  • Removed all augmentation
  • Made all layers learnable

I was able to make Resnet50 converge under certain situations, with large batch size, reduced size images and certain learning rates. However, switching to a larger Resnet or changing any of the parameters breaks the training again.

@alcinos
Copy link
Contributor

alcinos commented Aug 5, 2020

Hi @ririya

Thank you for your interest in DETR.
I have a couple of questions:

I've been trying to train the network from scratch

Define "from scratch" here. Do you at least use an ImageNet pre-trained backbone? Note that without this, it is not trivial to make the network converge, even on Coco (see #157)

it reaches a training loss of ~23

Training loss is not very informative, what is the mAP and how does it compare to your EfficientDet baseline?

@ririya
Copy link
Author

ririya commented Aug 5, 2020

Hi @alcinos Thx for replying!

All my backbones are pretrained on imagenet.

Whenever the training gets stuck, the mAP also gets stuck below 0.1.

As of now I was able to make it converge using Resnet101, Resnet101-DC5 and also Resnest101.

I was able to train it using the aforementioned mods. I’m always importing the trained transformer from the Resnet101 model and replacing what I need.

Using all backbones my results are comparable to EfficientDet D1 (around 0.5 mAP on my dataset)

However it still doesnt work with the Efficientnet backbone.

@alcinos
Copy link
Contributor

alcinos commented Aug 5, 2020

I haven't experimented with EfficientNet so I can't really offer you any guidance there. It might depend on the exact way it is pre-trained. You could try increasing a bit the backbone learning rate, eg 5e-5 for example and see if it helps.

For the rest, I'm a bit surprised that it is so hard for you to converge with Imagenet pre-trained resnet backbone and scratch transformer. Maybe your data distribution is very different than coco and you need to think about other data-augmentations that may make sense.

Otherwise, I think it is perfectly fine to rely on fine-tuning from a coco-pretrained model as you are doing. I don't really think you need to fiddle with the number of queries though, 100 should be fine.

Best of luck

@ririya
Copy link
Author

ririya commented Aug 5, 2020

Thanks @alcinos I’ll try to follow your suggestions. I’l already getting good results but just wondering why it’s so hard to get this working sometimes.

@ririya
Copy link
Author

ririya commented Aug 20, 2020

I was finally able to run i with efficientnet. I think there was a problem with the imported imagenet weights.

However resnet101-dc5 still gives me the best results. It is now beating EfficientDet. However inference time is 30 ms more. I've modified efficientnet to include dilations, as they seem to be critical. Anxious for the results.

@fmassa
Copy link
Contributor

fmassa commented Aug 21, 2020

@ririya keep us updated! We are doing some preliminary experiments with EfficientNets and they do seem to work fairly well with DETR.

@munirfarzeen
Copy link

Hi,
@ririya could you share your hyperparameters you use to train with efficientnet as a backbone.
That would be great help.

@ririya
Copy link
Author

ririya commented Dec 14, 2020

Hi,
@ririya could you share your hyperparameters you use to train with efficientnet as a backbone.
That would be great help.

@likui01

The only thing I modified was a learning rate of 10-5 for both backbone and detr and I’m using 30 queries because my images dont have a lot of objects. One thing that helped with the convergence was importing the trained transformers from of one the given models and replacing what I needed. I also tried a few different pretrained efficientnet models one of them did not work, maybe there was some problem with the imagenet weights. Hope this helps you.

@munirfarzeen
Copy link

@ririya thank you for your reply. I tried changing the learning rate like you suggested but my network is still not learning as you can see in the figure. i am using mobilenet_v2 backbone from pytorch
130854055_417927339404191_6042647866819840149_n

@ririya
Copy link
Author

ririya commented Dec 14, 2020

@likui01 I haven't tried mobilenet_v2. Does your training converge using the provided Resnet50 pretrained models?

@munirfarzeen
Copy link

@ririya , yes it does converge with resent50, using pretrain weights

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants