-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss breaks after reaching minima #28
Comments
What do you use the two remaining output channels for? Maybe the loss function is bumpy because of whatever loss function you apply to those? |
Hi @javiribera , those two channels remain unbounded, i.e, I don't attach them to any loss function. I believe I need to provide a detailed report on the analysis I've done. First observation: The larger the batchsize, more stable is the training. Minimum 16 BS was required by me to ensure convergence was stable for a longer time. Learning 5e-5. Case A: Only wHauss. Case B: wHauss in a multi-task paradigm Case C: wHauss with pretrained stable weights |
Maybe this discussion helps: #2 I cannot help with segmentation tasks since I have never applied the WHD to that purpose and it was not the intention of the paper. This repository is the implementation of that paper and it is not intended to be an all-in-one code for other tasks. So let's focus on your original case (0). The problem of interest is that when using it by itself, you see the WHD loss decrease in a very noisy manner. You mention it converges whithin 1 epoch, which seems very fast. I do remember that the WHD is noisy but never found it a huge problem. Do you see the same with SGD? |
Would you happen to have any intuition on this?
I'm using a U-net style network (with skipped connections). Output -> 3 channels. The centre of mass, in my case, the pupil centre, is regressed from channel 1.
I use torch.sigmoid on channel 1 before giving it as input to weighted H loss and a sufficiently small learning rate (5e-5) with ADAM.
I observe that the loss reduces 0.03 -> 0.009 and the output starting to look as expected from channel 1, i.e, we start seeing the expected blob. Post convergence to a minima (which happens within 1 epoch), the loss goes its maximum (0.1 in my case) and stays there. I checked the gradient norms and found that there is a lot of fluctuation in the norm values. Furthermore, the loss is jumpy on every iteration.
Would you have an intuition about this?
The text was updated successfully, but these errors were encountered: