-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polygon_train.py - Expected all tensors to be on the same device #31
Comments
I also wanted to clarify that I ran the setup.py script, which threw no errors but in the above error output it says I am still missing two functions:
I am running this on Google colab. |
Please read carefully the README.md, which guides you to install the functions polygon_inter_union_cuda and polygon_b_inter_union_cuda (which are cuda codes). You are getting these errors because you havent installed them. |
I did follow the READ.md. I completed the setup.py step and followed it step by step and still received the error above as noted in my comments. |
On the example Collab that is provided, these are the instructions at the top: `### For colab, run the following codes from google.colab import drivedrive.mount('/content/gdrive')cd to your directory%cd /content/gdrive/MyDrive/Your_Dircd to polygon-yolov5%cd polygon-yolov5install requirements!pip install -r requirements.txtinstall cuda extensions for polygon box iou computation%cd utils/iou_cuda!python setup.py installcd back%cd ..%cd ..`following those steps and then continuing on to run the subsequent code blocks end up in an error that was originally posted. Seems setup.py is not installing the required packages. |
I have used the colab to run the code, and there is no error related to the installation of cuda extension function. Anyway, all the problems mentioned are caused by that "`polygon_inter_union_cuda' and 'polygon_b_inter_union_cuda' are not installed.". Ensure your installation process of setup.py raises no error. If there are errors reported in the extension installation process, try to check the compatibility of environments and solve the errors. I dont think the errors mentioned persist if the installation process is successful. |
I solved this issue. In my case, It happened because module was being installed in different path.
|
@pocca2048 I tried that solution and unfortunately still getting the same error. Even though I receive a message saying that it was successfully installed, it shows this after setup.py installation (I don't see any issues): `running sdist warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied creating polygon_inter_union_cuda-0.0.0 |
@scocke How about checking where it is installed and if that path is in |
@XinzeLee '-gencode=arch=compute_80,code=sm_80',
'-gencode=arch=compute_86,code=sm_86',
'-gencode=arch=compute_86,code=compute_86' Then I proceeded to train a small model and everything worked fine. However, I tried to run it on a RTX3060 equipped machine as well and faced another problem regarding this installation.
It was looking for cuda version '-gencode=arch=compute_37,code=sm_37',
'-gencode=arch=compute_60,code=sm_60', '-gencode=arch=compute_61,code=sm_61',
'-gencode=arch=compute_70,code=sm_70', '-gencode=arch=compute_72,code=sm_72',
'-gencode=arch=compute_75,code=sm_75', '-gencode=arch=compute_80,code=sm_80',
'-gencode=arch=compute_86,code=compute_86' I only kept >> import torch
>> torch.cuda.get_device_capability()
(8, 6)
>> sm = torch.cuda.get_device_capability()
>> arg = '-gencode=arch=compute_{0}{1},code=sm_{0}{1}'
>> arg.format(*sm)
'-gencode=arch=compute_86,code=sm_86' |
@nsabir2011 |
When following the tutorial, I ran the following line of code:
!python polygon_train.py --weights polygon-yolov5s-ucas.pt --cfg polygon_yolov5s_ucas.yaml \ --data polygon_ucas.yaml --hyp hyp.ucas.yaml --img-size 1024 \ --epochs 3 --batch-size 12 --noautoanchor --polygon --cache
I then received the below error while training was starting:
`Warning: "polygon_inter_union_cuda" and "polygon_b_inter_union_cuda" are not installed.
The Exception is: /usr/local/lib/python3.7/dist-packages/polygon_inter_union_cuda-0.0.0-py3.7-linux-x86_64.egg/polygon_inter_union_cuda.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefIlEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEE.
YOLOv5 🚀 v1.0-27-g42d6884 torch 1.9.0+cu111 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)
Namespace(adam=False, artifact_alias='latest', batch_size=12, bbox_interval=-1, bucket='', cache_images=True, cfg='./models/polygon_yolov5s_ucas.yaml', data='./data/polygon_ucas.yaml', device='', entity=None, epochs=3, evolve=False, exist_ok=False, global_rank=-1, hyp='./data/hyp.ucas.yaml', image_weights=False, img_size=[1024, 1024], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=True, nosave=False, notest=False, polygon=True, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs/train/exp', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=12, upload_dataset=False, weights='polygon-yolov5s-ucas.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.1, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=30.0, translate=0.1, scale=0.0, shear=5.0, perspective=0.0005, flipud=0.5, fliplr=0.5, mosaic=0.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1182720 models.common.C3 [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 29667 models.yolo.Polygon_Detect [2, [[31, 30, 28, 49, 50, 31], [46, 45, 58, 58, 74, 74], [94, 94, 115, 115, 151, 151]], [128, 256, 512]]
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Model Summary: 283 layers, 7077027 parameters, 7077027 gradients, 16.5 GFLOPs
Transferred 360/362 items from polygon-yolov5s-ucas.pt
Scaled weight_decay = 0.00046875
Optimizer groups: 62 .bias, 62 conv.weight, 59 other
albumentations: MedianBlur(always_apply=False, p=0.05, blur_limit=(3, 7)), ToGray(always_apply=False, p=0.1), RandomBrightnessContrast(always_apply=False, p=0.35, brightness_limit=(-0.2, 0.2), contrast_limit=(-0.2, 0.2), brightness_by_max=True), CLAHE(always_apply=False, p=0.2, clip_limit=(1, 4.0), tile_grid_size=(8, 8)), InvertImg(always_apply=False, p=0.3)
train: Scanning '../UCAS50/train' images and labels...40 found, 0 missing, 0 empty, 0 corrupted: 100% 40/40 [00:00<00:00, 158.38it/s]
train: New cache created: ../UCAS50/train.cache
train: Caching images (0.1GB): 100% 40/40 [00:00<00:00, 78.13it/s]
val: Scanning '../UCAS50/val.cache' images and labels... 9 found, 0 missing, 0 empty, 0 corrupted: 100% 9/9 [00:00<?, ?it/s]
val: Caching images (0.0GB): 100% 9/9 [00:00<00:00, 20.73it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Plotting labels...
Image sizes 1024 train, 1024 test
Using 4 dataloader workers
Logging results to runs/train/exp
Starting training for 3 epochs...
0% 0/4 [00:01<?, ?it/s]
Traceback (most recent call last):
File "polygon_train.py", line 551, in
train(hyp, opt, device, tb_writer, polygon=opt.polygon)
File "polygon_train.py", line 312, in train
loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size
File "/content/PolygonObjectDetection/polygon-yolov5/utils/loss.py", line 274, in call
iou = polygon_bbox_iou(pbox, tbox[i], CIoU=True, device=device, ordered=True) # iou(prediction, target)
File "/content/PolygonObjectDetection/polygon-yolov5/utils/general.py", line 961, in polygon_bbox_iou
alpha = v / (v - iou + (1 + eps))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`
The text was updated successfully, but these errors were encountered: