AI CUP 2024 Spring Official Competition Website
Generative-AI Navigation Information Competition for UAV Reconnaissance in Natural Environments I:Image Data Generation
以生成式AI建構無人機於自然環境偵察時所需之導航資訊競賽 I - 影像資料生成競賽
🚀 Check workshop.ipynb to reproduce the result we've made.
🤗 Or follow the Usage to customize your workflow!
📈 Check Result or refer to the Submission History section for more details.
Placement:
- Public: 18th
- Private: 13th
Members:
- Chen-Yang Yu, NCKU (Team Leader)
- Yuan-Chun Chiang, NTU
- Yu-Hao Chiang, NCKU
- Xin-Xian Lin, NCKU
Collaboration:
- Team 5574: Sherry2580/AI-cup-2024-spring
Our task is to translate the black-and-white draft imagery into drone imagery.
Domain Type | Draft Imagery | Drone Imagery |
---|---|---|
Road | ||
River |
The dataset contains 2 domains:
label_img
: black-and-white draft imagery.img
: drone imagery.
Training Dataset Folder Structure:
training_dataset
├── label_img (trainA)
│ ├── TRA_RI_1000000.png
│ ├── TRA_RI_1000001.png
│ └── ...
└── img (trainB)
├── TRA_RO_1000000.jpg
├── TRA_RO_1000001.jpg
└── ...
We have provided some preprocessing method in our code, including:
-
Data Filtering (remove low-quality images at
img
): -
Data Augmentation:
we employ horizontal flip and vertical flip to augment the dataset.
Raw Image Method Results Horizontal Flip Vertical Flip -
Dataset Split (Enhanced Model's Architecture):
split the dataset into
RIVER
andROAD
domains.dataset ├── train_ROAD │ ├── trainA (Draft Images) │ └── trainB (Drone Images) └── train_RIVER ├── trainA (Draft Images) └── trainB (Drone Images)
Note: we do not get the best result by using all the above methods.
The testing dataset contains only the label_img
folder, which is the black-and-white draft imagery.
Testing Dataset Folder Structure:
testing_dataset
└── label_img (testA)
├── PRI_RI_1000000.png
├── PRI_RI_1000001.png
└── ...
After Dataset Split:
dataset
├── test_ROAD
│ └── testA (Draft Images)
│ ├── PRI_RO_1000000.png
│ ├── PRI_RO_1000001.png
│ └── ...
└── test_RIVER
└── testA (Draft Images)
├── PRI_RI_1000000.png
├── PRI_RI_1000001.png
└── ...
We propose 2 methods to train the model.
- Baseline (ROAD-RIVER at same time)
- Enhanced (2 domain-specific models)
At first, we train the model with all the ROAD and RIVER dataset at the same conditional GAN model. However, the result is not good enough.
Hence, we proposed to train 2 domain-specific models for ROAD and RIVER dataset separately.
We have tried to tune the hyperparameters, including n_epochs
, n_epochs_decay
, batch_size
, netG
.
The best result we got is to train the model with the following hyperparameters:
n_epochs = 200
n_epochs_decay = 200
batch_size = 1
netG = unet_256
Since the result from the pix2pix model is in 256x256
format, we tried to use the super resolution method to upscale the image to 428x240. However, the result did not improve a lot.
(You can check the super resolution code in other/super_resolution.ipynb)
We believe that the result can be improved by using pix2pixHD or img2img-turbo.
However, due to the lack of hardware resources and competition time limitation, we did not try this method.
We show the result of the baseline and enhanced model in the following table.
FID (Frechet Inception Distance) as the evaluation metric.
The lower the score, the better the result.
Model | Public Testing | Private Testing |
---|---|---|
Baseline | 141.6813 | x |
Enhanced | 129.4026 | 128.060178996 |
Enhanced + data filtering + data augmentation |
206.5882 | 206.667928949 |
Unfortuantely, when we try to add more data preprocess to our dataset, the result turns worse. Since we train the model with batch_size 64, which cause the GAN learning unstable. If we have more time, we will try to train the model with a smaller batch size.
git clone https://github.com/LittleFish-Coder/gen-ai-uav
cd gen-ai-uav
pip install -r requirements.txt
Make sure you download the dataset from the AI cup website, and put the dataset in the gen-ai-uav/dataset
folder.
run workshop.ipynb to directly reproduce the result we've made.
Before you start, make sure you have finished the Setup section.
At this section, we have 3 steps for you to follow:
(you can customize your own workflow by following the steps below)
- Prepare The Dataset
- Train The Model (optional)
- Test The Model
In each notebook, we provide baseline and enhanced method for you to follow. (You can just finish the basline part for quick testing.)
Run dataset/preprocess_dataset.ipynb
to download and preprocess the dataset.
We have provided the pre-trained model, you can directly move to the next step.
If you want to train the model, please run train_model.ipynb
We provide the pre-trained model, you can directly run test_model.ipynb
for baseline dataset testing.
unfold the details to see the submission history.
Time | Filename | Public Score | Private Score | Description |
---|---|---|---|---|
4/24 | submission.zip | Format Error | x | Inference with AI cup pretrained-weight |
5/04 | submission1.zip | 178.4705 | x | 1. Inference with pre-trained-weight 2. Preprocess: invert the white and black color |
5/04 | submission2.zip | 182.4264 | x | test the model with trained-weight-epoch-40 |
5/04 | submission3.zip | 181.2201 | x | test the model with trained-weight-epoch-170 |
5/05 | submission400.zip | 172.6293 | x | test the model with trained-weight-epoch-400 |
5/05 | submission200.zip | 142.2167 | x | retrain the model with 200 epoch since I misuse the training set |
5/06 | submission_road_river.zip | 134.3143 | x | train 2 domain-specific models for road and river train with 200 epochs |
5/17 | submission_retrain200.zip | 142.1900 | x | 1. use the re-trained weights for all dataset (200 epochs) 2. test the image in single_test_mode |
5/17 | submission_road_river_80epochs.zip | 144.3565 | x | train 2 domain-specific models for 80 epochs and test in single mode |
5/17 | submission_all_load_size_256.zip | 141.6813 | x | test the image in single_test_mode and load_size as 256 |
5/18 | submission_road_river_400epochs.zip | 124.7482 | x | train 2 domain-specific models for 400 epochs and test in single mode |
5/21 | submission_retrain200_resnet.zip | 172.1164 | 1000.0 | retrain model with resnet block |
5/21 | submission_private_resnet.zip | 1000.0 | 173.808621769 | use the resnet trained model to inference on private testing dataset |
5/21 | submission_private_unet256.zip | 1000.0 | 138.084645591 | use the unet256 trained model to inference on the private testing dataset |
5/25 | submission_road_river_400.zip | 129.4026 | 128.060178996 | 1. test with public and private dataset 2. train 2 domain-specific model for 400 epochs |
5/26 | upscaled_images.zip | 126.9314 | 128.301406203 | use super resolution to upscale image from 256x256 to 420x240 |
5/26 | submission_428_240.zip | 133.3959 | 132.658006179 | upscale 2 domain data in 428x240 (before: I miss resize the size in 420x240) |
5/26 | upscaled_images_428_240.zip | 127.3133 | 129.260890304 | super resolution upscale to 428x240 |
5/27 | submission_1000epoch.zip | 132.2360 | 131.510869954 | retrain model with data filtered with 1000epoch and resize with interpolation CUBIC |
5/27 | submission_400_interpolation.zip | 133.2471 | 130.429431557 | use pretrained 400 netG and resize using interpolation cubic |
5/28 | submission_400_1000.zip | 206.5882 | 206.667928949 | train from 400 pre-trained to 1000 |
5/28 | upscaled_images.zip | 1000.0 | 156.563343145 | upscale image (only private) _ 400 epoch |
5/28 | submission_20.zip | 147.1295 | 147.788939653 | refinetune the dataset and train 20 epoch |
@inproceedings{CycleGAN2017,
title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
year={2017}
}
@inproceedings{isola2017image,
title={Image-to-Image Translation with Conditional Adversarial Networks},
author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
year={2017}
}