Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swin/VIT training #16

Open
JohnMBrandt opened this issue Aug 15, 2024 · 1 comment
Open

Swin/VIT training #16

JohnMBrandt opened this issue Aug 15, 2024 · 1 comment

Comments

@JohnMBrandt
Copy link

JohnMBrandt commented Aug 15, 2024

Hello -- really appreciate your work! I was able to get a ResNet 50 model to train perfectly well on my custom dataset using your config files and confirmed that Stable DINO / R50 is better than DINO / R50 for my COCO-like dataset.

However, when trying to change to using Swin or ViT, the Stable DINO does not train properly but DINO does.

I have tried:

  • modifying positional encoding temperature to match MMDetection values for transformer backbones (20/0) instead of (10000/-0.5)
  • confirming that backbone model weights load exactly as they do in MMDetection and have equivalent values
  • confirming that channelmapper / neck inputs and outputs are exactly as expected
  • confirming that batch size, weight decay, optimizer, learning rate, etc are all exactly the same
@JohnMBrandt
Copy link
Author

JohnMBrandt commented Aug 17, 2024

MaP from Stable Dino implemented in detrex with a ViT-H backbone on my custom dataset:

AP,AP50,AP75,APs,APm,APl
15.9893,24.5232,18.4653,10.0865,35.2932,51.1695

mAP of Dino implemented in detrex with a ViT-H backbone:

AP,AP50,AP75,APs,APm,APl
24.1312,49.2856,21.1866,18.3770,38.8828,57.5000

Not quite sure what's going on to cause the huge difference. The only difference in the cfg files is changing:

  1. encoder=L(DINOTransformerEncoder) to encoder=L(StableDINOTransformerEncoder)
  2. Adding multi_level_fusion="dense-fusion" to the encoder
  3. Using StableDINO criterion and matcher
  4. Adjusting the classification loss from 1 to 6 as in this repository,
  5. Commenting out the aux weight as is done in this repository
  6. Adding the additional cfg parameters for Stable DINO as in this repository:
use_ce_loss_type="stable-dino",
 ta_alpha=0.0,
ta_beta=2.0,
gdn_k=2,
neg_step_type='none',
no_img_padding=False,
dn_to_matching_block=False,

EDIT: Closing the loop on this, I found that the issue was in the StableDINO matcher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant