-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why should we train NetL separately in train_L_step_w #14
Comments
Sry my bad. I saw the email and I must reply in mind... IIUC your idea is correct. It is only a different implementation. I am not sure about Also I found the weakly sup training is tricky. The loss might not drop in the first several epochs. Below might be (did not document the exp well so I am not 100% sure lol) part of my training log for your reference:
the first col might steps (sry cannot remember all the details) and the 2nd col is the sup loss and 3rd col is w sup loss. You can see the 2nd col just fluctuated while the 3rd col dropped from 0.024 to 0.016 |
In the function
train_L_step_w
, line 192, intrain.py
, we the gradients of loss are firstly propagated to NetL with Doc3D data set. Then in the gradients are propagated to NetL with DIW. People update the NetL twice in one step (optimizer_L.step()
x2).I have a question. We compute the
spvLoss.lloss
andwarp_diff_loss
with the corresponding data set and a supervision mask (using thetorch.where
function to remove unrelative gradients) , so that we can update the NetL in one step.I read your paper, did not read the issues, and train the NetL with my idea. It does not converge.
According to the optimization theory, these two training tricks will lead to different NetLs. Therefore, I want to know the result of my training trick. Or let us check the reason.
The text was updated successfully, but these errors were encountered: