We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dear author, I use YOLOX-x model to train my own COCO-style datasets which has 18 classes. Here is my env:
nvidia A40 8 gpus cuda: 11.2 cudnn: 8.0.5 pytorch: 1.7 torchvision: 0.8.0 apex: 0.1
And my traning command is as follows, I did not use fp16 for training:
python tools/train.py -n yolox-x -d 8 -b 8 -o -c /home/shaom/pretrained/yolox_x.pth.tar --opts data_num_workers 4 num_classes 18 output_dir /home/shaom/YOLOX/outputs max_epoch 100
Meanwhile, I change basic_lr_per_img to 0.0001/8.0 And the training log is as follows, the total loss does not converge:
epoch: 1/100, iter: 10/3018, mem:39233Mb, iter_time:2.277s, data_time: 0.001s, total_loss: 4.7, iou_loss: 0.0, l1_loss: 0.0, conf_loss: 4.7, cls_loss: 0.0, lr: 4.392e-11, size: 640, epoch: 1/100, iter: 20/3018, mem:39233Mb, iter_time: 2.782s, data_time: 0.001s, total_loss: 13.1, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 7.5, cls_loss: 3.2, lr: 1.757e-10, size: 832, ETA: 8 days, 20:01:58 epoch: 1/100, iter: 30/3018, mem:39233Mb, iter_time: 2.176s, data_time: 0.001s, total_loss: 16.2, iou_loss: 3.7, l1_loss: 0.0, conf_loss: 10.2, cls_loss: 2.3, lr: 3.952e-10, size: 544, epoch: 1/100, iter: 40/3018, mem:39233Mb, iter_time: 2.142s, data_time: 0.001s, total_loss: 14.2, iou_loss: 2.2, l1_loss: 0.0, conf_loss: 8.1, cls_loss: 3.9, lr: 7.027e-10, size: 736, epoch: 1/100, iter: 50/3018, mem:39233Mb, iter_time: 0.834s, data_time: 0.001s, total_loss: 11.7, iou_loss: 1.6, l1_loss: 0.0, conf_loss: 6.3, cls_loss: 3.8, lr: 1.098e-09, size: 544, epoch: 1/100, iter: 60/3018, mem:39233Mb, iter_time: 3.109s, data_time: 0.001s, total_loss: 13.5, iou_loss: 3.2, l1_loss: 0.0, conf_loss: 6.9, cls_loss: 3.4, lr: 1.581e-09, size: 672, epoch: 1/100, iter: 70/3018, mem:39233Mb, iter_time: 2.119s, data_time: 0.001s, total_loss: 17.2, iou_loss: 2.1, l1_loss: 0.0, conf_loss: 10.2, cls_loss: 4.9, lr: 2.152e-09, size: 704, epoch: 1/100, iter: 80/3018, mem:39233Mb, iter_time: 1.880s, data_time: 0.001s, total_loss: 13.0, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 6.6, cls_loss: 4.0, lr: 2.811e-09, size: 800, epoch: 1/100, iter: 90/3018, mem:39233Mb, iter_time: 0.976s, data_time: 0.001s, total_loss: 13.4, iou_loss: 3.4, l1_loss: 0.0, conf_loss: 7.6, cls_loss: 2.4, lr: 3.577e-09, size: 544, ... epoch: 1/100, iter: 2050/3018, mem:39233Mb, iter_time: 0.969s, data_time: 0.001s, total_loss: 12.5, iou_loss: 3.6, l1_loss: 0.0, conf_loss: 6.2, cls_loss: 2.7, lr: 1.846e-06, size: 480,
The text was updated successfully, but these errors were encountered:
Please provide more details. For example, your exp file, your training log, etc
Sorry, something went wrong.
No branches or pull requests
Dear author, I use YOLOX-x model to train my own COCO-style datasets which has 18 classes.
Here is my env:
And my traning command is as follows, I did not use fp16 for training:
Meanwhile, I change basic_lr_per_img to 0.0001/8.0
And the training log is as follows, the total loss does not converge:
The text was updated successfully, but these errors were encountered: