Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have a problem with training, I don't know how to solve it #1

Open
jhfujvffg opened this issue Dec 13, 2024 · 0 comments
Open

I have a problem with training, I don't know how to solve it #1

jhfujvffg opened this issue Dec 13, 2024 · 0 comments

Comments

@jhfujvffg
Copy link

2024-12-12 17:26:15,375 INFO Start training nuscenes_models/bevfusion_graph(bevfusion_graph_deformable_result_scenes_K_graph8)
epochs: 0%| | 0/6 [00:00<?, ?it/s]Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos

Error in collate_batch: key=img_process_infos | 0/123580 [00:00<?, ?it/s]
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
Error in collate_batch: key=img_process_infos
epochs: 0%| | 0/6 [01:06<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 232, in
main()
File "train.py", line 177, in main
train_model(
File "/home/huangjie/Projects/GraphBEV/tools/train_utils/train_utils.py", line 180, in train_model
accumulated_iter = train_one_epoch(
File "/home/huangjie/Projects/GraphBEV/tools/train_utils/train_utils.py", line 33, in train_one_epoch
batch = next(dataloader_iter)
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/huangjie/Projects/GraphBEV/tools/../pcdet/datasets/dataset.py", line 322, in collate_batch
ret[key] = np.stack(val, axis=0)
File "<array_function internals>", line 200, in stack
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/numpy/core/shape_base.py", line 458, in stack
arrays = [asanyarray(arr) for arr in arrays]
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/numpy/core/shape_base.py", line 458, in
arrays = [asanyarray(arr) for arr in arrays]
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (6, 4) + inhomogeneous part.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/huangjie/Projects/GraphBEV/tools/../pcdet/datasets/dataset.py", line 325, in collate_batch
raise TypeError
TypeError

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3904030) of binary: /home/huangjie/miniconda3/envs/GraphBEV/bin/python
Traceback (most recent call last):
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/huangjie/miniconda3/envs/GraphBEV/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant