You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
The Vanilla Seq2Seq and HRED models report a "NaN tensor error" at the first training step.
The error code is clipped_grads, grad_norm = tf.clip_by_global_norm(self.gradients, params.max_gradient_norm) in hred_model.py.
How can I solve this problem?
P.S.
use embedding : random300
tensorfolw-gpu: 1.12.1
3-turn dataset
THRED and TA-Seq2Seq work well
It tracebacks:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/HRED/thred/main.py", line 6, in
tf.app.run(main=thred_main)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/data/HRED/thred/main.py", line 45, in main
model.train()
File "/data/HRED/thred/models/hierarchical_base.py", line 132, in train
step_result = loaded_train_model.train(train_sess)
File "/data/HRED/thred/models/hred/hred_model.py", line 446, in train
self.learning_rate])
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[node hred_graph/VerifyFinite/CheckNumerics (defined at /data/HRED/thred/models/hred/hred_model.py:131) = CheckNumericsT=DT_FLOAT, _class=["loc:@hred_graph/VerifyFinite/control_dependency"], message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[{{node hred_graph/clip_by_global_norm/mul/_187}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3642_hred_graph/clip_by_global_norm/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
The text was updated successfully, but these errors were encountered:
Hi,
The Vanilla Seq2Seq and HRED models report a "NaN tensor error" at the first training step.
The error code is
clipped_grads, grad_norm = tf.clip_by_global_norm(self.gradients, params.max_gradient_norm)
in hred_model.py.How can I solve this problem?
P.S.
It tracebacks:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/HRED/thred/main.py", line 6, in
tf.app.run(main=thred_main)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/data/HRED/thred/main.py", line 45, in main
model.train()
File "/data/HRED/thred/models/hierarchical_base.py", line 132, in train
step_result = loaded_train_model.train(train_sess)
File "/data/HRED/thred/models/hred/hred_model.py", line 446, in train
self.learning_rate])
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[node hred_graph/VerifyFinite/CheckNumerics (defined at /data/HRED/thred/models/hred/hred_model.py:131) = CheckNumericsT=DT_FLOAT, _class=["loc:@hred_graph/VerifyFinite/control_dependency"], message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[{{node hred_graph/clip_by_global_norm/mul/_187}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3642_hred_graph/clip_by_global_norm/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
The text was updated successfully, but these errors were encountered: