SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception. #12784
Unanswered
LoveSimons
asked this question in
Q&A
Replies: 3 comments
-
I've entered the same problem. |
Beta Was this translation helpful? Give feedback.
0 replies
-
无法确定是数据问题,还是多线程导致的问题 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
...
2023/08/18 09:14:58] ppocr INFO: epoch: [1/100], global_step: 280, lr: 0.000500, acc: 0.000000, norm_edit_dis: 0.042185, loss: 20161.078125, avg_reader_cost: 0.00043 s, avg_batch_cost: 0.84911 s, avg_samples: 128.0, ips: 150.74659 samples/s, eta: 1 day, 9:25:50
[2023/08/18 09:15:06] ppocr INFO: epoch: [1/100], global_step: 290, lr: 0.000500, acc: 0.000000, norm_edit_dis: 0.043604, loss: 20468.968750, avg_reader_cost: 0.00030 s, avg_batch_cost: 0.84909 s, avg_samples: 128.0, ips: 150.74880 samples/s, eta: 1 day, 9:24:34
[2023/08/18 09:15:15] ppocr INFO: epoch: [1/100], global_step: 300, lr: 0.000500, acc: 0.000000, norm_edit_dis: 0.043811, loss: 20012.769531, avg_reader_cost: 0.00031 s, avg_batch_cost: 0.84916 s, avg_samples: 128.0, ips: 150.73771 samples/s, eta: 1 day, 9:23:22
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 536, in _thread_loop
batch = self._get_data()
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 674, in _get_data
batch.reraise()
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/worker.py", line 172, in reraise
raise self.exc_type(msg)
ValueError: DataLoader worker(1) caught ValueError with message:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/worker.py", line 339, in _worker_loop
batch = fetcher.fetch(indices)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/fetcher.py", line 138, in fetch
data = self.collate_fn(data)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/collate.py", line 77, in default_collate_fn
return [default_collate_fn(fields) for fields in zip(*batch)]
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/collate.py", line 77, in
return [default_collate_fn(fields) for fields in zip(*batch)]
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/collate.py", line 58, in default_collate_fn
batch = np.stack(batch, axis=0)
File "<array_function internals>", line 6, in stack
File "/usr/local/lib/python3.7/dist-packages/numpy/core/shape_base.py", line 426, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
Traceback (most recent call last):
File "tools/train.py", line 227, in
main(config, device, logger, vdl_writer)
File "tools/train.py", line 202, in main
amp_dtype)
File "/home/project_glf/PaddleOCR-release-2.7/tools/program.py", line 269, in train
for idx, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 745, in next
self.reader.read_next_list()[0])
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)
I0818 09:15:22.301784 929 tcp_store.cc:257] receive shutdown event and so quit from MasterDaemon run loop
[2023-08-18 09:15:26,667] [ INFO] launch_utils.py:329 - terminate process group gid:838
INFO 2023-08-18 09:15:26,667 launch_utils.py:329] terminate process group gid:838
[2023-08-18 09:15:26,668] [ INFO] launch_utils.py:329 - terminate process group gid:844
INFO 2023-08-18 09:15:26,668 launch_utils.py:329] terminate process group gid:844
[2023-08-18 09:15:26,668] [ INFO] launch_utils.py:329 - terminate process group gid:850
INFO 2023-08-18 09:15:26,668 launch_utils.py:329] terminate process group gid:850
[2023-08-18 09:15:33,677] [ INFO] launch_utils.py:350 - terminate all the procs
INFO 2023-08-18 09:15:33,677 launch_utils.py:350] terminate all the procs
[2023-08-18 09:15:33,677] [ ERROR] launch_utils.py:659 - ABORT!!! Out of all 4 trainers, the trainer process with rank=[0] was aborted. Please check its log.
ERROR 2023-08-18 09:15:33,677 launch_utils.py:659] ABORT!!! Out of all 4 trainers, the trainer process with rank=[0] was aborted. Please check its log.
[2023-08-18 09:15:37,682] [ INFO] launch_utils.py:350 - terminate all the procs
INFO 2023-08-18 09:15:37,682 launch_utils.py:350] terminate all the procs
[2023-08-18 09:15:37,682] [ WARNING] launch.py:424 - Terminating... exit
WARNING 2023-08-18 09:15:37,682 launch.py:424] Terminating... exit
[2023-08-18 09:15:41,686] [ INFO] launch_utils.py:350 - terminate all the procs
INFO 2023-08-18 09:15:41,686 launch_utils.py:350] terminate all the procs
Beta Was this translation helpful? Give feedback.
All reactions