Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The value of BLEU is always 0 when training the expert model #14

Open
Zachary-YL opened this issue May 26, 2019 · 5 comments
Open

The value of BLEU is always 0 when training the expert model #14

Zachary-YL opened this issue May 26, 2019 · 5 comments

Comments

@Zachary-YL
Copy link

Hello, when I use Chinese (word segmentation) and English (token) parallel corpus to train the expert model, the value of BLEU is always 0. And the outputs of dev are all unk.
Like this:

step 6100 lr 0.00168035 step-time 0.08s exp-los 34.4890 gN 243.20 BLEU 0.00, Sat May 25 18:23:57 2019
step 6200 lr 0.00166674 step-time 0.08s exp-los 35.5610 gN 256.42 BLEU 0.00, Sat May 25 18:24:05 2019
step 6300 lr 0.00165346 step-time 0.08s exp-los 35.6490 gN 258.22 BLEU 0.00, Sat May 25 18:24:13 2019
step 6400 lr 0.0016405 step-time 0.08s exp-los 35.6305 gN 258.66 BLEU 0.00, Sat May 25 18:24:21 2019
step 6500 lr 0.00162783 step-time 0.08s exp-los 37.1311 gN 274.27 BLEU 0.00, Sat May 25 18:24:29 2019
step 6600 lr 0.00161545 step-time 0.08s exp-los 35.8512 gN 258.59 BLEU 0.00, Sat May 25 18:24:37 2019
step 6700 lr 0.00160335 step-time 0.08s exp-los 35.5759 gN 268.07 BLEU 0.00, Sat May 25 18:24:45 2019
step 6800 lr 0.00159152 step-time 0.08s exp-los 35.2491 gN 271.55 BLEU 0.00, Sat May 25 18:24:53 2019
step 6900 lr 0.00157995 step-time 0.08s exp-los 37.3241 gN 288.19 BLEU 0.00, Sat May 25 18:25:02 2019

Do you know why there is such a problem?Thanks!

@Zachary-YL
Copy link
Author

Exactly not unk. If I set the last word in the vocabulary is 'Again', every time the output is the last word 'Again'.

Like this:
The every line output of the file in saved_exp_model/output_dev and saved_exp_model/output_test:

b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again'

b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again'

b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again'

b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again'

b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again' b'Again'

@lovecambi
Copy link
Owner

Can you double check your source and target files? For examples, they should not be ids, but be strings.

@maohbao
Copy link

maohbao commented Jul 17, 2020

Hi Zachary,

Do you settle the problem? I get the same result as you, the value of BLEU is always 0, and the outputs of dev and test are like b'Again' b'Again' b'Again'......

@maohbao
Copy link

maohbao commented Jul 17, 2020

Hello, when I use Chinese (word segmentation) and English (token) parallel corpus to train the expert model, the value of BLEU is always 0. And the outputs of dev are all unk.
Like this:

step 6100 lr 0.00168035 step-time 0.08s exp-los 34.4890 gN 243.20 BLEU 0.00, Sat May 25 18:23:57 2019
step 6200 lr 0.00166674 step-time 0.08s exp-los 35.5610 gN 256.42 BLEU 0.00, Sat May 25 18:24:05 2019
step 6300 lr 0.00165346 step-time 0.08s exp-los 35.6490 gN 258.22 BLEU 0.00, Sat May 25 18:24:13 2019
step 6400 lr 0.0016405 step-time 0.08s exp-los 35.6305 gN 258.66 BLEU 0.00, Sat May 25 18:24:21 2019
step 6500 lr 0.00162783 step-time 0.08s exp-los 37.1311 gN 274.27 BLEU 0.00, Sat May 25 18:24:29 2019
step 6600 lr 0.00161545 step-time 0.08s exp-los 35.8512 gN 258.59 BLEU 0.00, Sat May 25 18:24:37 2019
step 6700 lr 0.00160335 step-time 0.08s exp-los 35.5759 gN 268.07 BLEU 0.00, Sat May 25 18:24:45 2019
step 6800 lr 0.00159152 step-time 0.08s exp-los 35.2491 gN 271.55 BLEU 0.00, Sat May 25 18:24:53 2019
step 6900 lr 0.00157995 step-time 0.08s exp-los 37.3241 gN 288.19 BLEU 0.00, Sat May 25 18:25:02 2019

Do you know why there is such a problem?Thanks!

Thank you very much if you can reply me!

@yidaxing
Copy link

yidaxing commented Aug 4, 2020

There may be a problem in creating the vocabulary,and I use onmt-build-vocab --size max_vocab_size --save_vocab $TEXT/src-vocab.txt $TEXT/train.en to create vocabulary, and it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants