Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

loss Nan #46

Open
Fly-dream12 opened this issue Sep 11, 2021 · 4 comments
Open

loss Nan #46

Fly-dream12 opened this issue Sep 11, 2021 · 4 comments

Comments

@Fly-dream12
Copy link

When training the model on custom dataset, the learning rate is set as 0.001 and ims_per_batch_label is set as 4, this error happens in ubteacher/modeling/proposal_generator/proposal_utils.py in find_top_rpn_proposals:
FloatingPointError: Predicted boxes or scores contain Inf/Nan. Training has diverged.

is any config should be altered? Thanks for your apply! @ ycliu93

@ycliu93
Copy link
Contributor

ycliu93 commented Sep 11, 2021

There are some tips for using Unbiased Teacher on custom datasets.

  1. The unsupervised loss weight affects the stability of semi-supervised training. The default loss weight is 4, which leads to the best performance in COCO 1% case, but I would suggest reducing this value to 1 to make sure it won't diverge. Once it is stable, you could try to increase it for a better result.

  2. Threshold is another important hyperparameter, and you could check how many pseudo-boxes are generated in the pseudo-set. You need to make sure the number of pseudo-box is similar or fewer than the number of boxes in the ground-truth labels. Make sure it won't generate too many pseudo-box gradually.

  3. You could also fix the teacher model first and check whether using a fixed teacher model can lead to a better student. If it helps, then you could use EMA to evolve the Teacher for a better result.

Hope these tricks help your experiment. Let me know if you have other questions. :)

@Fly-dream12
Copy link
Author

Fly-dream12 commented Sep 11, 2021 via email

@ycliu93
Copy link
Contributor

ycliu93 commented Sep 13, 2021

  1. I used the learning rate = 0.01 and it works on COCO 1% case, while I am not sure whether it is the best setup in your customized dataset. As long as it is within a reasonable range, it should not lead to divergence.

  2. I am not sure why your model cannot converge under the low unsupervised loss weight. Could you try to lower it to 0.5 or lower? Also, if possible, could you provide a brief description of your dataset? Maybe I could have some ideas after understanding your setup.

  3. Detectron2 provided Visualizer (https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer)

You could add it to the following line, where the thresholding function is used.

pesudo_proposals_roih_unsup_k, _ = self.process_pseudo_label(

As for the number of pseudo-labels, you could also check the element of pesudo_proposals_roih_unsup_k . Each element of the list is boxes of an image. For example, you could get the number of pseudo-boxes for the first image of a batch by printing len(pesudo_proposals_roih_unsup_k[0]). You could add this to the record_dict, and it will show on the tensorboard log for better tracking.

  1. There is a EMA_KEEP_RATE on config file. Just set it as 1.0 and the Teacher model should be fixed. On the contrary, if you set it as 0.0, you would have identical weights for the Teacher and Student.

Let me know whether these help you. Thanks!

@Fly-dream12
Copy link
Author

I have use the class elf.visualize_training, however, nothing appears in the tensorboard. @ycliu93

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants