convert the format of the caltech pedestrian dataset to the format that yolo uses
This repo is adapted from
- https://github.com/mitmul/caltech-pedestrian-dataset-converter
- https://pjreddie.com/media/files/voc_label.py
- opencv
- numpy
- scipy
- Convert the
.seq
video files to.png
frames by running$ python generate-images.py
. They will end up in theimages
folder. - Squared images work better, which is why you can convert the 640x480 frames to 640x640 frames by running
$ python squarify-images.py
- Convert the
.vbb
annotation files to.txt
files by running$ python generate-annotation.py
. It will create thelabels
folder that contains the.txt
files named like the frames and thetrain.txt
andtest.txt
files that contain the paths to the images. - Adjust
.data
yolo file - Adjust
.cfg
yolo file: take e.g.yolo-voc.2.0.cfg
and setheight = 640
,width = 640
,classes = 2
, and in the final layerfilters = 35
(= (classes + 5) * 5)
)
|- caltech
|-- annotations
|-- test06
|--- V000.seq
|--- ...
|-- ...
|-- train00
|-- ...
|- caltech-for-yolo (this repo, cd)
|-- generate-images.py
|-- generate-annotation.py
|-- images
|-- labels
|-- test.txt
|-- train.txt