Skip to content
This repository has been archived by the owner on Oct 26, 2022. It is now read-only.

Training with fconv model converges but not with blstm #125

Open
patrik-lambert opened this issue Mar 20, 2018 · 0 comments
Open

Training with fconv model converges but not with blstm #125

patrik-lambert opened this issue Mar 20, 2018 · 0 comments

Comments

@patrik-lambert
Copy link

Hi, I am trying to train an English-Finnish translation engine with a data set in the IT domain (about 800,000 unique sentence pairs, 13 million English words), using 32000 joint BPE operations (vocabulary is 13,500 for English and 22,700 for Finnish). The validation set (2000 sentence pairs) is randomly extracted from the training data (and removed from it).

Using the fconv model, the training completes nicely. The parameters are
-model fconv -nenclayer 10 -nlayer 8 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 -batchsize 32 -maxbatch 3200 \
-momentum 0.99 -timeavg -bptt 0 -nembed 512 -noutembed 512 -nhid 512

The training ends up with these values:
| checkpoint 018 | epoch 018 | 1004778 updates | s/checkpnt 5190 | words/s 4001 | lr 0.000025 | avg_dict_size 8692.39
| checkpoint 018 | epoch 018 | 1004778 updates | trainloss 1.09 | train ppl 2.13
| checkpoint 018 | epoch 018 | 1004778 updates | validloss 1.43 | valid ppl 2.69 | testloss 3.06 | test ppl 8.32

With the blstm model, I haven't been able to do a proper training. With the parameters suggested in the README the training ends after 2 epochs and a validation set perplexity of 99614929. I have tried different algorithms and learning rates, different number of layers, and in all cases the validation ppl is huge and BLEU scores very low. The lowest validation set ppl at first epoch (6500, but it then increases) is with the following parameters:
-model blstm -dropout 0.3 -optim sgd -lr 0.25 -clip 25 -bptt 25 -nembed 512 -noutembed 512 -nhid 512

Any idea of what could be happening or suggestions? Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant