This repository contains our team(randomTeamName)'s first place solution of the Global Wheat Challenge 2021.
Our solution is based on a customized version of this excellent YOLOv5 repo. We also use test-time augmentation, pseudo labeling, model ensembling methods and out of domain validation set to boost the performance.
Please first make sure you have Python==3.7.10
installed to reproduce our result, since this is the Python version on Google Colab Pro where we ran most of our experiments. If you want to experiment on your own, you will need Python>=3.6.0
.
-
clone our customized YOLOv5 repo into
GWC_YOLOv5
.$ git clone https://github.com/ksnxr/GWC_YOLOv5.git
-
install required dependencies.
-
Please refer to this for installing YOLOv5 dependencies. If you are using Google Colab, most of the dependencies should be in place, except PyYAML and ensemble_boxes. You can install them by:
!pip uninstall -y PyYAML !pip install PyYAML==5.3.1 !pip install ensemble_boxes
-
We require specific version of the following packages to reproduce our result:
- PyTorch: 1.9.0+cu102
- TorchVision: 0.10.0+cu102
- numpy: 1.19.5
- scipy: 1.4.1
-
-
run notebooks in this repo under the parent directory of
GWC_YOLOv5
.
To run our notebooks (for training), you will need
- roughly 20GB of RAM
- a GPU with 16GB VRAM (such as NVIDIA Tesla V100 and P100).
RAM size is a soft requirement as you can remove the --cache_images
or --cache
argument to reduce RAM usage, but you will get a much longer training time.
The following training time estimates are observed from V100 with --cache_images
on:
- For training the base models, each epoch takes roughly 2 minutes.
- For fine-tuning models with pseudo labels, each epoch takes about 4 minutes.
- Use data-cleaning.ipynb to generate clean_train.csv.
- Use KFold.ipynb to generate 4-folds train and validation indexes.
- Use basis-0.ipynb, basis-1.ipynb.basis-2.ipynb and basis-3.ipynb to train on the four folds.
- Use pseudo-original-0.ipynb, pseudo-original-1.ipynb, pseudo-original-2.ipynb, pseudo-original-3.ipynb and pseudo-master-3.ipynb to train 5 pseudo models.
- Use get-labels.ipynb to ensemble the 5 weights obtained in step 3 and generate labels for subsequent training.
- Use pseudo-master-final.ipynb to train the final model.
The final single model yields 0.700 on the final private leaderboard.
If you do not want to train these models, we also provide the weights of our final model so that you can simply run the inference code. You can access it from here.
Then you can reproduce our results on Google Colab:
# clone source codes
!git clone https://github.com/ksnxr/GWC_YOLOv5.git
# retrieve test images
from google.colab import drive
drive.mount("/content/drive")
!unzip -q drive/MyDrive/test.zip -d ./
# inference
!cd GWC_YOLOv5 && python detect.py --img-size 800 --name best --weights /path/to/best.pt --source ../test --augment --nosave --save-csv --conf-thres 0.5
The detection results can be found in GWC_YOLOv5/runs/detect/best/submission.csv
.
If you wish to do some experiments on our solution, we also provide our general training and inference notebooks.
- For general training, see general-train.ipynb. This notebook provides a complete pipeline for training, pseudo labeling, fine-tuning with pseudo labels, detecting and saving to submission.
- For general inference, see general-inference.ipynb. This notebook contains codes to direct inference and ensemble models with weighted boxes fusion.
Note:
- You need to run KFold.ipynb first to generate 4 training folds with OOD val set. If you wish to experiment on another domain-shift dataset, you can also modify this notebook to generate your own folds.
- Also ensure that you are using the
master
branch ofGWC_YOLOv5
.
We highly recommend using fold index 3 (last fold) for experiment, as we found that this fold has the best performance. According to our training records, we once achieved a final ADA of 0.698 with fold 3 only.
We would like to thank those who inspired us with open source solutions, including:
- Ultralytics for their YOLOv5 repository
- ZFTurbo for his weighted boxes fusion repository
- nvnn for his pseudo labeling notebook