Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Filed to run GNMT #5

Open
skyw opened this issue Jul 13, 2017 · 14 comments
Open

Filed to run GNMT #5

skyw opened this issue Jul 13, 2017 · 14 comments
Assignees

Comments

@skyw
Copy link

skyw commented Jul 13, 2017

It complains a key error
"KeyError: num_residual_layers"

Here is my script

python -m nmt.nmt
--src=en --tgt=de
--vocab_prefix=${DATA_DIR}/vocab
--train_prefix=${DATA_DIR}/train
--dev_prefix=${DATA_DIR}/newstest2014
--test_prefix=${DATA_DIR}/newstest2015
--out_dir=$(OUT_DIR}/test
--hparams_path nmt/standard_hparams/wmt16_en_de_gnmt.json

@oahziur oahziur self-assigned this Jul 13, 2017
@oahziur
Copy link
Contributor

oahziur commented Jul 13, 2017

Thanks, I will need to update the nmt/standard_hparams/wmt16_en_de_gnmt.json.

@oahziur
Copy link
Contributor

oahziur commented Jul 13, 2017

I am also adding instructions on how to train and load the gnmt model from scratch.

@vince62s
Copy link

I ran a standard attention / scaled_luong / uni system and go the expected results.
Same with gnmt architecture / scaled_luong / enc_type gnmt, completely off.
Is there something special to do for GNMT attention architecture ?

@oahziur
Copy link
Contributor

oahziur commented Jul 21, 2017

@vince62s Did you check with the standard_hparams for GNMT, there are also pre-trained models available for download in the README page.

@ndvbd
Copy link

ndvbd commented Jan 2, 2018

Same problem here. After training, when doing inference I get:

KeyError: 'num_encoder_residual_layers'

It only works when I delete all these keys from the hparams file, and when I set the --hparams_path to the directory of the best_bleu, but then after one run, for some reason, it rewrites the hparams file, and add these problematic key/values again... It's not clear how this mechanism works.

My guess is that when the code is saving hparams, it simply writes key values that it doesn't suppose to.

@oahziur
Copy link
Contributor

oahziur commented Jan 2, 2018

@NadavB can you share the command getting the error? were you using the standard_hparams file in the repo for inference?

There are some updates to the hparams recently, so I think the standard_hparams maybe out of date.

@ndvbd
Copy link

ndvbd commented Jan 4, 2018

@oahziur I did not use the standard hparams. I used the params as shown in the tutorial.

So for training:

--attention=scaled_luong \
    --src=vi --tgt=en \
    --vocab_prefix=tmp/nmt_data/vocab  \
    --train_prefix=tmp/nmt_data/train \
    --dev_prefix=tmp/nmt_data/tst2012  \
    --test_prefix=tmp/nmt_data/tst2013 \
    --out_dir=/tmp/nmt_attention_model \
    --num_train_steps=5000 \
    --steps_per_stats=20 \
    --num_layers=2 \
    --num_units=128 \
    --dropout=0.2 \
    --metrics=bleu

And for inference:

python nmt/nmt.py \
    --out_dir=/tmp/nmt_attention_model \
    --inference_input_file=/tmp/nmt_data/source_infer.vi \
    --inference_output_file=/tmp/nmt_attention_model/output_infer

@LimWoohyun
Copy link

@NadavB Hello. i"m studying nmt.
i want to run test file. so i ran just. nmt,py but failed.

how to command your script?? please let me know basiclly

@ndvbd
Copy link

ndvbd commented Jan 10, 2018

@LimWoohyun Look at https://github.com/tensorflow/nmt -> search for "Hands-on – building an attention-based NMT model" the command is written there.

@bquast
Copy link

bquast commented Feb 22, 2018

@oahziur I get Key error using the standard_hparams (tf 1.6rc1, will try on my other machine with tf1.5-cuda).

NotFoundError (see above for traceback): Key dynamic_seq2seq/encoder/rnn/basic_lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

using the command:

[bquast@UX370UA ~]$ cd nmt
[bquast@UX370UA nmt]$ python -m nmt.nmt \
>     --src=de --tgt=en \
>     --ckpt=deen_gnmt_model_4_layer/translate.ckpt \
>     --hparams_path=nmt/standard_hparams/wmt16_gnmt_4_layer.json \
>     --out_dir=/tmp/deen_gnmt \
>     --vocab_prefix=/home/bquast/en_de_data/vocab.bpe.32000 \
>     --inference_input_file=/home/bquast/en_de_data/newstest2014.tok.bpe.32000.de \
>     --inference_output_file=/home/bquast/deen_gnmt_model_4_layer/output_infer \

full output here:

https://gist.github.com/bquast/30ba7630d2bf32b59dd8349889fc7638

EDIT: confirmed, same error on tf15.-cuda

https://gist.github.com/bquast/0ddbf8eda363d312dd57b51aebb11f5d

@tiberiu92
Copy link

tiberiu92 commented Mar 8, 2018

@bquast I recently got the error using the same configuration.

Key dynamic_seq2seq/encoder/rnn/basic_lstm_cell/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

I tried this with tf14 too, but no luck. Are there any updates on this?

Thank you.

@bquast
Copy link

bquast commented Mar 9, 2018

hey, no news yet, any progress on your side?

@oahziur
Copy link
Contributor

oahziur commented Mar 10, 2018

@bquast I think this is related to #264 and there is a PR fixed this #265. Maybe you can try patch the PR and see if you still get the issue. Make sure you clear the model directory.

@xiaohaoliang
Copy link

xiaohaoliang commented Jul 3, 2018

@bquast @tiberiu92 @oahziur
I got the same error using the same configuration. (tf-1.8, python-2.7)

python -m nmt.nmt \
    --src=de --tgt=en \
    --ckpt=/home/xiaohao/nmt/models/deen_gnmt_model_4_layer/translate.ckpt \
    --hparams_path=nmt/standard_hparams/wmt16_gnmt_4_layer.json \
    --out_dir=/home/xiaohao/data/deen_gnmt \
    --vocab_prefix=/home/xiaohao/data/wmt16/vocab.bpe.32000 \
    --inference_input_file=/home/xiaohao/data/wmt16/newstest2015.tok.bpe.32000.de \
    --inference_output_file=/home/xiaohao/data/deen_gnmt/output_infer \
    --inference_ref_file=/home/xiaohao/data/wmt16/newstest2015.tok.bpe.32000.en	
NotFoundError (see above for traceback): Key dynamic_seq2seq/encoder/rnn/basic_lstm_cell/bias not found in checkpoint

I print keys of deen_gnmt_model_4_layer/translate.ckpt ,not find .../rnn/basic_lstm_cell/bias

xiaohao@ubuntu:~/nmt$ python ckpt_print.py models/deen_gnmt_model_4_layer/translate.ckpt
('CHECKPOINT_FILE: ', 'models/deen_gnmt_model_4_layer/translate.ckpt')
('tensor_name: ', 'embeddings/encoder/embedding_encoder')
('tensor_name: ', 'dynamic_seq2seq/decoder/memory_layer/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_3/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_3/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/decoder/output_projection/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/query_layer/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/attention_v')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/attention_b')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/bahdanau_attention/attention_g')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias')
('tensor_name: ', 'Variable')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_0_attention/attention/basic_lstm_cell/bias')
('tensor_name: ', 'embeddings/decoder/embedding_decoder')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_2/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias')
('tensor_name: ', 'dynamic_seq2seq/decoder/multi_rnn_cell/cell_2/basic_lstm_cell/kernel')
('tensor_name: ', 'dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel')
xiaohao@ubuntu:~/nmt$

I try the PR(#265), and rm -rf /home/xiaohao/data/deen_gnmt/* . The problem is sloved!

tks~ @oahziur

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants