(so far 3rd place solution), pytorch implementation
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── config <- yaml configuration files
│ ├── config.yaml <- config for training
│ ├── config_eval.yaml <- config for inference
│ ├── dataset.yaml <- config for train data extraction
├── models
│ ├── get_models.sh <- will download models
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt` filtered to minimal set
├── combine_images.py <- final ensable using models average
├── dataset.py <- data loading / preprocesing
├── find_hyperparameters.py <- find hyper-parameters for given model
├── inference.py <- inference script for single model
├── model.py <- model definition
├── optim.py <- optimizers definition
├── train.py <- training script
├── space_net_data.py <- utility script for GT extraction from SpaceNet dataset (#TODO - hardcoded paths)
Simple approach: end-to-end segmentation with fully convolutional neural network:
step 1:
- Net architecture: FPN[1] with efficient-net(b1)[2] back-bone
- Identify hyper-parameters (weight decay, learning rate)
- train (one-cycle learning policy, optimizers: AdamW, losses: Focal Loss, Dice Loss (in separate heads = train only once, see what work best, try to use all outputs in final ensemble))
- the network have additional 2 heads – direct classification head (is there a building or not – later to be used for negative examples mining) and scale regression head (the samples in dataset are in different scales, so this was a attempt (fail) to make things easier/better performing in inference)
- datasets: tier-1, and some data from SpaceNet dataset (7 fold)
step 2: Try “Self-training with Noisy Student”[2], and negative mining
- label test data and data from tier2 (soft labels) with model from step 1 (with TTA)
- mine negative samples
- repeat training with efficient-net(b2) back-bone, for soft labels, the KL divergence loss has been used.
step 3: Standard “competition madness”:
- average ensemble of models (from steps 1,2) with TTA augumentation (scale, flipping, transposing)
The step 2 is with big ?: I think that just repeating step 1 with b2 backbone will lead to same results
- python >= 3.6
- torch >= 1.2.0
- torchvision>=0.3.0
- albumentations>=0.4.5
- efficientnet-pytorch==0.6.3
- geopandas
- pandas
- rasterio
- opencv-python
- hydra-core
- numpy
- scikit-learn
- optuna (optional)
Download models (following script will download all models used in final submission):
cd models
sh get_models.sh
and run inference:
python inference.py data_dir=<path to test data> model_name=<efficientnet-b1|efficientnet-b2> model=<path to the model name>
for infrerence with model 'b1' model:
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth
*Note: the run folder is generated, so relative paths are ../../models/... or use absolute paths *
for infrerence with model 'b2' model:
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth use_context_block=True use_mish=True
Backbone | Model | Inference Resolution | Public Jaccard |
---|---|---|---|
b1 | models/m40000.pth | 672 | 0.8167 |
b2 | models/ave.pth | 1024 | 0.8255 |
b2 | models/ave2.pth | 896 | 0.8203 |
b2 | models/model-b2-1.pth | 672 | 0.8209 |
b2 | models/mb2-m20000.pth | x | x |
b2 | models/mb2-m35000.pth | x | x |
b2 | models/model-b2-2.pth | x | x |
For training, data are converted to std. format: image(img) and corresponding GT mask, with naming convention (img.(tif|jpg), img_mask.png). Data can be generated by dataset.py script:
python dataset.py base_dir=<path to dataset> out_dir=<output directory>
Data used in step 1:
python dataset.py base_dir=<path to dataset> out_dir=<output directory> scale=1
python dataset.py base_dir=<path to dataset> out_dir=<output directory> scale=2
python dataset.py base_dir=<path to dataset> out_dir=<output directory> scale=0.5
First mistake: over-sampling: in this way (almost all buildings from dataset are sampled) = there is no 'valid' validation dataset for later use
and some parts of SpaceNet dataset:
Parameters / hyper-paramers are tuned for 1CPU/2threads, 1060 GPU, for better performance, set num_workers parameter to CPUs * threads -1 and:
- test max. batch size for your GPU (run train.py for a few steps increasing batch_size parameter until you get CUDA out of memory - provided learning rate and weight decay are for = batch size: 5, input width = 512 )
python find_hyperparamets.py batch_size=<?> data_dir=<path to train data>
For tracing of the training process, the https://neptune.ai/ platform is used. (If you want to enable web logging, fill 'neptune..' fields in config.yaml file)
For 'b1' backbone:
python train.py batch_size=<your batch size / my is 5> data_dir=<path to train data> max_lr=<lr found by find_hyperparamers / ~ 0.0002> weight_decay=<found by find_hyperparamers / ~6.322983921368948e-7> fold=0 model_name=efficientnet-b1
For 'b2' backbone:
python train.py batch_size=<your batch size / my is 5> data_dir=<path to train data> max_lr=<lr found by find_hyperparamers / ~ 0.0002> weight_decay=<found by find_hyperparamers / ~6.322983921368948e-7> fold=0 model_name=efficientnet-b2 use_mish=True use_context_block=True
python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=512 debug=False data_dir=<path to src(test) images> mine_empty=True output_dir_empty=<output directory for images>
(optional) label test data and data from tier_2 with soft labels using best performing model / model ensamble
Just run inference on test data:
python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=512 debug=False data_dir=<path to tier_2/test images> output_dir=outputs_masks
#next steps are with TTA
python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=672 debug=False data_dir=<path to tier_2/test images> output_dir=outputs_masks
python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=672 debug=False data_dir=<path to tier_2/test images> flip=2 output_dir=outputs_masks
...
python combine_images.py base_dir=../../outputs_masks add_mask_suffix=True
and train model with soft masks:
python train.py batch_size=<your batch size / my is 5> data_dir=<path to train data> max_lr=<lr found by find_hyperparamers / ~ 0.0002> weight_decay=<found by find_hyperparamers / ~6.322983921368948e-7> fold=0 model_name=efficientnet-b2 use_mish=True use_context_block=True drop_connect_rate=0.5 soft_labels_dir=<path to dir with generated soft labels>
Run training on different fold for few epochs using already trained model ... add to final ensamble ... mine more negative samples ...
Nothing clever here, outputs average - just TTA with image fliping, transposing, scaling
#efficientnet-b1
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=3 output_dir=outputs_final #TTA flip input over dim 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=3 output_dir=outputs_final #TTA flip input over dim 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=2 output_dir=outputs_final #TTA flip input over dim 2
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=23 output_dir=outputs_final #TTA flip input over dim 2 and 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=0 transpose=True output_dir=outputs_final #TTA transpose image
#efficientnet-b2
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=0 transpose=True use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=3 transpose=True use_mish=True use_context_block=True output_dir=outputs_final #TTA flip 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=2 transpose=True use_mish=True use_context_block=True output_dir=outputs_final #TTA flip 2
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=23 transpose=True use_mish=True use_context_block=True output_dir=outputs_final #TTA flip 23
#To this point, public LB shuld be: 0.8393
#Meta-algorithm for next part (ideal for people with sleeping disorder):
#1. run inference with random TTA random checkpoint
#2. -> do something meaningful until results are generated (optional)
#3. add results to ensamle
#4. submit results if improves LB score keep resutls
#5. go to step one
#efficientnet-b2: some scales / different check-points ...
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave2.pth width=512 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave2.pth width=896 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave2.pth width=960 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m20000.pth width=960 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m20000.pth width=832 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m20000.pth width=1024 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-1.pth width=896 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m35000.pth width=640 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=1024 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=800 threshold=0.56 use_mish=True use_context_block=True flip=23 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=800 threshold=0.56 use_mish=True use_context_block=True flip=23 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=992 threshold=0.56 use_mish=True use_context_block=True flip=3 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=640 threshold=0.56 use_mish=True use_context_block=True flip=3 output_dir=outputs_final
#Great improvement: public LB + 0.002 :)
python combine_images.py base_dir=../../outputs_final
Final score is probably reachable using just 2 models: models/m40000.pth and models/ave.pth with TTA augumentation
- Feature Pyramid Networks for Object Detection
- Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019
- Self-training with Noisy Student improves ImageNet classification
Possible reasons why step 2 did not work (so well):
- Noise in network was increased just in back-bone part
- Small amount of data with soft labels / does not work well with the segmentation task