-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why squeeze here? #7
Comments
Sorry, I have another problem here. I think what NCE does is just computing the loss on selected target class, which should involve only parts of the weights/biases for linear layer. Say the last linear layer have Line 60 in 862afc6
But when I look into the gradient after the back propagation, Line 125 in 862afc6
the number of non-zero gradient for bias (should be Do you have any idea why this happened? I am checking this because I think we should do a sparse parameter update in advanced optimizer like |
@chaoqing for the non-zero elements mismatching, I suspect that |
@Stonesjtu the non-zero elements mismatching still exists after I tried your check. Actually at first I checked the non-zero position and found out the non-zero gradient is always part of the truth or sampled ones, which is expected. We may need look into the how the gradient is formulated to find out why part of touched samples have zero gradient. |
Hi, I think there is a bug here:
Pytorch-NCE/nce.py
Line 198 in 862afc6
For RNN model which the last layer before softmax has shape [B * N * D] where time steps
N>1
, I believe thesqueeze
do not have any effect. Maybe for batch sizeB=1
? If that is the case,squeeze(0)
might be a better choice.I am using your code for predicting the last state (in other words,
N=1
). Thesqueeze
here will give amodel_loss.shape = (B , 1)
andnoise_loss.shape = (B,)
and then the totalloss.shape = (B, B)
, which should be(B,1)
I think.The text was updated successfully, but these errors were encountered: