num_layers does't work #6

EricAugust · 2018-07-13T05:34:06Z

I choose different num_layers params, but it still 1, and I rewrite code, give me lots of error.

Stonesjtu · 2018-07-13T05:53:09Z

This code runs on python3 only. Could you plz upload your error message.

As for the num_layers, you may print the model directly to have an insight into it. There's should be an LSTM module with layer=2.

EricAugust · 2018-07-13T06:25:04Z

error just say about except hidden[0] dose not match got.
In index_gru.py
self.rnn = nn.GRU(self.ninp, self.nhid, num_layers=1, dropout=args.dropout, batch_first=True)
I change the 1 to args.num_layers, and also change get_noise_score.batched_rnn_output
view(1,-1,...) to view(args.num_layer，-1,...)

  def get_noise_score(self, noise_idx, rnn_output):
       """Get the score of noise given supervised context
       Args:
           - noise_idx: (B, N, N_r) the noise word index
           - rnn_output: output of rnn model
       Return:
           - noise_score: (B, N, N_r) score for noise word index
       """

       noise_emb = self.encoder(noise_idx.view(-1))
       noise_ratio = noise_idx.size(2)

       # rnn_output of </s> is useless for sentence scoring
       batched_rnn_output = rnn_output[:, :-1].unsqueeze(2).expand(
           -1, -1, noise_ratio, -1
       ).contiguous().view(1, -1, self.nhid)

       noise_output, _last_hidden = self.rnn(
           noise_emb.view(-1, 1, self.nhid),
           batched_rnn_output,
       )

       noise_score = self.scorer(noise_output).view_as(noise_idx)
       return noise_score

ex: it will give me error like this:
except hidden[0] size (2, 500, 200) but got (2, 250, 200).
I don't know where I should change the code for fix it.

Stonesjtu · 2018-07-13T06:34:23Z

Well the GRU version supports only 1 layer. It's because the CUDNN's GRU only gives the hidden states of the last layer, but this kind of contrasting needs all the hidden states across layers.
See

Pytorch-NCE/index_gru.py

Line 39 in 0f4cf44

# this GRU only outputs the hidden for last layer, so only 1 layer is supported

Actually we could stack CUDNN 's bulit in GRU for all the hidden states we need. Nice hint!

EricAugust · 2018-07-13T06:41:37Z

sorry, I didn't read the code carefully.
Thanks a lot.

Stonesjtu · 2018-07-13T06:51:57Z

I'm going to raise a warning for this situation until the multi-layered version is ready.

EricAugust · 2018-07-13T13:25:59Z

hi, I'm sorry to have your time again. I got other question.
After I run this code which I change a little bit. It gives me pretty high ppl. Val loss will be 6~7 or greater. PPL gonna be hundreds or thousands. So I thought maybe I did some wrong.
I use your code without any change run on wikitext-2. With same params in pytorch/example which will reaching perplexity of 110.44 after epochs 6. But NCE + gru gave higher result, after 18 epochs val loss is 5.4, and ppl is 222.15. The val loss basically reamin the same.
To deal with this situation, any suggestions about this?

Stonesjtu · 2018-07-13T15:46:55Z

Well, since the actually PPL of index GRU is hard to compute, so the printed loss is simply the NCE loss, which is not comparable with the CrossEntropy loss.

EricAugust · 2018-07-14T07:56:21Z

Yeah, Index_linear can work well. but still a little bit higher than pytorch/example. After 20 epoch ppl is 165. Based on that code, change how to generate batch , and control vocab size, the model val loss is hard to reduce. Cannot training. quit strange.

Stonesjtu · 2018-07-16T03:59:00Z

Hi, Eric. I failed to reproduce the PPL of 165 on my server, could you plz delete the data/penn/vocab.pkl and runs again to see if it happens again. I suspect this bug is caused by the vocabulary building.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

num_layers does't work #6

num_layers does't work #6

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 14, 2018

Stonesjtu commented Jul 16, 2018

num_layers does't work #6

num_layers does't work #6

Comments

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 13, 2018

Stonesjtu commented Jul 13, 2018

EricAugust commented Jul 14, 2018

Stonesjtu commented Jul 16, 2018