-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to resize error #39
Comments
Hi! Thanks for reporting. This sounds like a very weird issue which I have not seen before. Does it occur consistently, i.e. is it reproducible? Can you try to isolate the issue? From your trace and a quick google search, it seems related to a zero dimensional tensor being pickled, maybe you can investigate if/why this happens? |
Hi! Thanks for responding quickly. It occurs on a random basis. Sometimes, the algorithm goes on until 100 epochs. Sometimes, this happens. It seems that this error happens right before line 84 of train.py as below.
I haven't made any changes to the existing code. Could I know which exact version of PyTorch(>1.7.0) and Python(>3.8) this code is based on? Im using Pytorch 1.7.1 and Python 3.8.3 btw. |
Did you find a fix? I'm running into the same error and it seems to be in the same line. I don't understand where it would even come from. I don't see where in this part of the code zero dimensional tensors could appear or what gets pickled there. Python 3.8.3 and PyTorch 1.8.1 btw. Edit: I figured out it happens when enumerate(training_dataloader) is called and it can be avoided when setting the number of workers of the data loader to 0, I am unsure however why that happens. |
Hi. Many thanks for making your work public. It's been a pleasure reading your paper.
I tried running the code on Spyder. It works fine until at one point, it hits the following runtime error.
Start train epoch 12, lr=0.0001 for run run_20210510T145253
Evaluating baseline on dataset...
100%|██████████| 10/10 [00:00<00:00, 22.94it/s]
100%|██████████| 10/10 [00:03<00:00, 3.04it/s]
100%|██████████| 1/1 [00:00<00:00, 23.44it/s]
100%|██████████| 1/1 [00:00<00:00, 22.40it/s]
Finished epoch 12, took 00:00:03 s
Saving model and state...
Validating...
Validation overall avg_cost: -7.61328125 +- 0.06633966416120529
Evaluating candidate model on evaluation dataset
Epoch 12 candidate mean -7.60546875, baseline epoch 11 mean -7.64453125, difference 0.0390625
Start train epoch 13, lr=0.0001 for run run_20210510T145253
30%|███ | 3/10 [00:00<00:00, 22.74it/s]Evaluating baseline on dataset...
100%|██████████| 10/10 [00:00<00:00, 22.78it/s]
0%| | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\site-packages\torch\multiprocessing\reductions.py", line 88, in rebuild_tensor
t = torch._utils._rebuild_tensor(storage, storage_offset, size, stride)
File "C:\Users\user\anaconda3\envs\attentionVRP\lib\site-packages\torch_utils.py", line 133, in rebuild_tensor
return t.set(storage, storage_offset, size, stride)
RuntimeError: Trying to resize storage that is not resizable at ..\aten\src\TH\THStorageFunctions.cpp:87
The problem is op with const data distribution. To make problem simple, I set graph_size as 20, batch_size 512, epoch_size as 5120, eval_batch_size 512, and 100 epochs. Other parameters are set as before.
Any idea to tackle this problem?
Thanks in advance!
The text was updated successfully, but these errors were encountered: