We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INFO:tensorflow:Using config: {'_model_dir': '/home/yzh/v3plus/tensorflow-deeplab-v3-plus/dataset/test2/model/new', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 1000000000.0, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff57ace0c88>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} INFO:tensorflow:Start training. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. 2021-11-02 23:17:44.611893: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2021-11-02 23:17:44.828950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.755 pciBusID: 0000:02:00.0 totalMemory: 23.70GiB freeMemory: 23.44GiB 2021-11-02 23:17:44.967547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties: name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.755 pciBusID: 0000:81:00.0 totalMemory: 23.69GiB freeMemory: 23.27GiB 2021-11-02 23:17:44.967602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1 2021-11-02 23:21:49.062821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-11-02 23:21:49.062859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2021-11-02 23:21:49.062867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N 2021-11-02 23:21:49.062871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N 2021-11-02 23:21:49.063044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22724 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:02:00.0, compute capability: 8.6) 2021-11-02 23:21:49.063425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22555 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:81:00.0, compute capability: 8.6) INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Saving checkpoints for 0 into /home/yzh/v3plus/tensorflow-deeplab-v3-plus/dataset/test2/model/new/model.ckpt. INFO:tensorflow:cross_entropy = 1.9338539, learning_rate = 0.007, train_mean_iou = 0.014417753, train_px_accuracy = 0.086506516 INFO:tensorflow:loss = 24.278753, step = 0 ERROR:tensorflow:Model diverged with loss = NaN. Traceback (most recent call last): File "train.py", line 285, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "train.py", line 267, in main hooks=train_hooks, File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default saving_listeners) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run run_metadata=run_metadata) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run run_metadata=run_metadata) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run raise six.reraise(*original_exc_info) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/six.py", line 719, in reraise raise value File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run return self._sess.run(*args, **kwargs) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1320, in run run_metadata=run_metadata)) File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 753, in after_run raise NanLossDuringTrainingError tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
INFO:tensorflow:Using config: {'_model_dir': '/home/yzh/v3plus/tensorflow-deeplab-v3-plus/dataset/test2/model/new', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 1000000000.0, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff57ace0c88>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Start training.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2021-11-02 23:17:44.611893: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2021-11-02 23:17:44.828950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.755
pciBusID: 0000:02:00.0
totalMemory: 23.70GiB freeMemory: 23.44GiB
2021-11-02 23:17:44.967547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.755
pciBusID: 0000:81:00.0
totalMemory: 23.69GiB freeMemory: 23.27GiB
2021-11-02 23:17:44.967602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1
2021-11-02 23:21:49.062821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-02 23:21:49.062859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1
2021-11-02 23:21:49.062867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N
2021-11-02 23:21:49.062871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N
2021-11-02 23:21:49.063044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22724 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:02:00.0, compute capability: 8.6)
2021-11-02 23:21:49.063425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22555 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:81:00.0, compute capability: 8.6)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /home/yzh/v3plus/tensorflow-deeplab-v3-plus/dataset/test2/model/new/model.ckpt.
INFO:tensorflow:cross_entropy = 1.9338539, learning_rate = 0.007, train_mean_iou = 0.014417753, train_px_accuracy = 0.086506516
INFO:tensorflow:loss = 24.278753, step = 0
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
File "train.py", line 285, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 267, in main
hooks=train_hooks,
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
run_metadata=run_metadata)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
raise six.reraise(*original_exc_info)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/six.py", line 719, in reraise
raise value
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
return self._sess.run(*args, **kwargs)
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1320, in run
run_metadata=run_metadata))
File "/home/anaconda3/envs/tf-dpv3plus/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 753, in after_run
raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
The text was updated successfully, but these errors were encountered: