Training with fconv model converges but not with blstm #125

patrik-lambert · 2018-03-20T12:44:30Z

Hi, I am trying to train an English-Finnish translation engine with a data set in the IT domain (about 800,000 unique sentence pairs, 13 million English words), using 32000 joint BPE operations (vocabulary is 13,500 for English and 22,700 for Finnish). The validation set (2000 sentence pairs) is randomly extracted from the training data (and removed from it).

Using the fconv model, the training completes nicely. The parameters are
-model fconv -nenclayer 10 -nlayer 8 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 -batchsize 32 -maxbatch 3200 \
-momentum 0.99 -timeavg -bptt 0 -nembed 512 -noutembed 512 -nhid 512

With the blstm model, I haven't been able to do a proper training. With the parameters suggested in the README the training ends after 2 epochs and a validation set perplexity of 99614929. I have tried different algorithms and learning rates, different number of layers, and in all cases the validation ppl is huge and BLEU scores very low. The lowest validation set ppl at first epoch (6500, but it then increases) is with the following parameters:
-model blstm -dropout 0.3 -optim sgd -lr 0.25 -clip 25 -bptt 25 -nembed 512 -noutembed 512 -nhid 512

Any idea of what could be happening or suggestions? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with fconv model converges but not with blstm #125

Training with fconv model converges but not with blstm #125

patrik-lambert commented Mar 20, 2018

Training with fconv model converges but not with blstm #125

Training with fconv model converges but not with blstm #125

Comments

patrik-lambert commented Mar 20, 2018