Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make train #26

Open
cjy-cc opened this issue Apr 2, 2020 · 0 comments
Open

make train #26

cjy-cc opened this issue Apr 2, 2020 · 0 comments
Assignees

Comments

@cjy-cc
Copy link

cjy-cc commented Apr 2, 2020

Hi sir, I think your code is very meaningful and I want to reproduce it, but I have a problem and want to trouble you. When I was training, I found that the following problems occurred.

`Initialize the graph with random parameters.
bucket 0: (10, 23) (3463)
bucket 1: (14, 23) (3396)
bucket 2: (28, 23) (2954)
Epoch 1
0%| | 0/4000 [00:00<?, ?it/s]2020-04-02 09:59:41.480441: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.485770: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.498355: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.502850: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.507510: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.510871: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.514412: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.517675: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.520942: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.523802: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.527851: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.530873: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.534375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.538748: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.543058: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.546131: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.549882: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.553088: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.556465: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.559658: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.563791: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.566951: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.570187: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.573230: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.576664: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.580121: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.583457: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.586287: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.589678: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.592683: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.596938: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.599845: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.604331: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.607419: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.610917: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.613959: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.617808: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.620678: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.624038: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.627022: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.630438: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.634057: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.638341: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.642178: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.645559: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-02 09:59:41.648711: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node token_decoder_decoder_rnn_2/Attention_0/Conv2D}}]]
[[add_5/_849]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node token_decoder_decoder_rnn_2/Attention_0/Conv2D}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cc/anaconda3/envs/a/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/cc/anaconda3/envs/a/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 378, in
tf.compat.v1.app.run()
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 353, in main
train(train_set, dataset)
File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 95, in train
sess, formatted_example, bucket_id, forward_only=False)
File "/home/cc/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 631, in step
outputs = session.run(output_feed, input_feed)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/cc/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node token_decoder_decoder_rnn_2/Attention_0/Conv2D (defined at /anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]]
[[add_5/_849]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node token_decoder_decoder_rnn_2/Attention_0/Conv2D (defined at /anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'token_decoder_decoder_rnn_2/Attention_0/Conv2D':
File "/anaconda3/envs/a/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/anaconda3/envs/a/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 378, in
tf.compat.v1.app.run()
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/anaconda3/envs/a/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 353, in main
train(train_set, dataset)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 64, in train
model = define_model(sess, forward_only=False, buckets=train_set.buckets)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/translate.py", line 53, in define_model
FLAGS, session, Seq2SeqModel, buckets, forward_only)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/graph_utils.py", line 142, in define_model
model = model_constructor(params, buckets)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/seq2seq/seq2seq_model.py", line 28, in init
super(Seq2SeqModel, self).init(hyperparams, buckets)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 71, in init
self.define_graph()
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 140, in define_graph
encoder_copy_inputs=self.encoder_copy_inputs[:bucket[0]]
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/framework.py", line 256, in encode_decode
encoder_copy_inputs=encoder_copy_inputs)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/seq2seq/rnn_decoder.py", line 199, in define_graph
decoder_cell(input_embedding, state)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/decoder.py", line 240, in call
attns, alignments = self.attention(cell_output)
File "/下载/2 shiyan/nl2bash-master/nl2bash-master/encoder_decoder/decoder.py", line 211, in attention
input_tensor=l * tf.tanh(tf.nn.conv2d(input=v, filters=k, strides=[1,1,1,1], padding="SAME")), axis=[2, 3])
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1913, in conv2d_v2
name=name)
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2010, in conv2d
name=name)
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 793, in _apply_op_helper
op_def=op_def)
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3360, in create_op
attrs, op_def, compute_device)
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3429, in _create_op_internal
op_def=op_def)
File "/anaconda3/envs/a/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1751, in init
self._traceback = tf_stack.extract_stack()

0%| | 0/4000 [00:25<?, ?it/s]
Makefile:41: recipe for target 'train' failed
make: *** [train] Error 1
`

@todpole3 todpole3 self-assigned this Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants