Skip to content

Adding a Dataset

Andrés Solís Montero edited this page Apr 4, 2016 · 9 revisions

###Overview

Before continuing with this part of the documentation make sure you understand the project's folder structure here

Annotated datasets are not part of the git repository. The datasets are stored in external servers in compressed files to facilitate download. The datasets are downloaded while configuring and generating your vivaTracker project. Datasets are stored in tar.gz format for cmake cross-platform compatibility. Once downloaded, the datasets file are compared with their md5 hash value and extracted to the sequences folder.

A sequences.txt file is generated and contains the full path to the sequences folder in your system. This file is propagated and included into the Debug and Release folder of your built project to allow the user to store the datasets in a different path.

Inside the macros.txt file you will find the dataset urls and their corresponding md5 hashes. For example the vot2013 dataset will appear as:

SET(vot2013  http://www.site.uottawa.ca/research/viva/datasets/tracking/vot2013.tar.gz)
SET(md5_vot2013  7467447b0d533efcb458ab106c66497a)
...
SET(_DATASETS vot2013 vot2014 vot2015)

Adding a Dataset from URL

Let's us assume we have a new dataset named dataset2016 and it's compressed and stored in our public server url: "http://mypublicserver.com/dataset2016.tar.gz". Its corresponding md5 checksum is 8567134b0d632eafb492ab107c69213e. To include the dataset2016 in your project build.

Inside the macros.txt file you will need to add the two lines:

SET(dataset2016  http://mypublicserver.com/dataset2016.tar.gz)
SET(md5_dataset2016  8567134b0d632eafb492ab107c69213e)

and modify the following line by adding the new dataset variable name

SET(_DATASETS vot2013 vot2014 vot2015 dataset2016)

The dataset structure

Each tar.gz dataset file contains a root folder with the name of the dataset and inside of it folders for each sequence belonging to the dataset. Inside the sequence folders we could optionally create a ""groundtruth.txt"" file annotating the sequence.

For example the vot2013 dataset containing 16 sequences stored in the compressed file vot2013.tar.gz will generate the following folder structure when extracted:

vot2013/      (root folder with dataset name)
    bicycle/         (subfolder with sequence name)
        000001.jpg   (sequence files....)
        000002.jpg
        ...
        groundtruth.txt (optional annotated groundtruth file)
    bolt/
      ....
    car/
    cup/
    david/
    diving/
    face/
    gymnastics/
    hand/
    iceskater/
    juice/
    jump/
    singer/
    sunshade/
    torus/
    woman/

Each sequence folder contains a list of alphabetically ordered image files and their ground truth annotations.

Ground truth file format

The ground truth file format could have one of two possible formats like in the VOT Challenges datasets.

  1. The compressed archive contains directories of images for each sequence and per frame annotations of the axis-aligned bounding box marking the object. The annotations are stored in a text file with the format:

x, y, width, height
where x and y are the pixel coordinates of the top left corner of the bounding box marking the object.

  1. The compressed archive contains directories of images for each sequence and per frame annotations of the rotated bounding box marking the object. The annotations are stored in a text file with the format:

x1, y1, x2, y2, x3, y3, x4, y4
where Xi and Yi are the coordinates of corner i of the bounding box in frame N in clock-wise order.

The frame number N is the N-th row in the text file.

The following is an example of the first 10 lines of a groundtrouth.txt file corresponding to the first 10 frames.

154.00,94.00,18.00,48.00
153.00,92.00,18.00,48.00
153.00,90.00,19.00,49.00
152.00,89.00,19.00,50.00
152.00,87.00,20.00,51.00
151.00,86.00,20.00,52.00
151.00,84.00,21.00,53.00
151.00,83.00,22.00,54.00
153.00,83.00,22.00,54.00
153.00,82.00,22.00,54.00

Using your algorithm with a sequence from an installed dataset

Once you compiled your vivaTracker project, you could execute your executable using existing or already in place sequences from datasets. For example, in a terminal you could run

./vivaTracker vot2013/bolt

the software will look inside the sequences folder for a dataset named vot2103 and sequence bolt, if the groundtruth.txt file is found inside that folder it will be displayed while executing the tracker.

For more details about command line arguments please refer to the command line arguments documentation

Moving the sequences folder

Each time we execute the vivaTracker software, it will check for a sequences.txt file in the current working directory. This file contains the full path to the folder with the available datasets. If the user wish to move the sequences folder to a particular path (e.g., /tmp/datasets/), just need to make sure the file contains the full path to the folder containing the datasets.

Content of the file sequences.txt pointing to /tmp/datasets/

/tmp/datasets/