This project demonstrates a pipeline for detecting country flags from aerial views using a YOLO pre-trained model. The model is trained in multiple stages using a combination of synthetic datasets generated through Python and Blender, and real-life flag images. The final model can effectively detect flags placed on the ground from a drone's perspective.
The pipeline for detecting country flags consists of the following stages:
-
Pre-training:
- A synthetic dataset is generated using Python scripts to simulate the flags in a variety of environments and lighting conditions.
- The YOLO pre-trained model is used to train on this synthetic dataset, providing an initial understanding of flag detection.
-
Intermediate Training:
- After the initial training, we leveraged transfer learning by retraining the model on the real dataset collected from RoboFlow. This step fine-tuned the model, enabling it to adapt to the variations present in real-world images.
-
Fine-tuning:
- A synthetic dataset is created using Blender, which simulates the flags from various angles and lighting conditions that may be encountered by drones in aerial footage.
- The model is fine-tuned on this dataset to enhance its performance for the specific task of detecting flags in aerial views.
-
Final Model:
- The final model is achieved after these stages of training and fine-tuning, resulting in a highly accurate detection system for country flags.
The synthetic dataset is generated using Blender to create realistic 3D environments where flags are placed on the ground. This dataset is crucial for training the model to detect flags in various conditions such as different lighting, angles, and backgrounds.
-
Automated Dataset Generation:
- Synthetic images of 196 different flags.
- Over 15,000 images generated with various augmentations.
-
Data Augmentation Techniques:
- Random Positioning: Flags are positioned randomly in the scene.
- Random Rotation: Each flag is rotated at different angles to introduce variability.
- Random Camera Heights: Simulated different camera angles and heights for a more diverse dataset.
- Random Backgrounds: Varied and complex backgrounds to simulate real-world environments.
-
Auto-Annotation:
- Automated bounding box generation for object detection.
- Annotations are formatted for training models using YOLO.