-
Notifications
You must be signed in to change notification settings - Fork 475
Working on custom datasets.
If you need to work with your own dataset using the TSN codebase. You will need to extend the code. But this is easy. The steps are summarized as follows.
Datasets and their annotations come in all kinds of formats. To keep the following stages uniform, we ask the dataset to provide a unified data structure of annotations, that is similar to the one use to represent UCF101 and HMDB51 in TSN.
The basic information for a video is represented as a Python tuple of (filename, class_label)
. Here class_label
should be an integer. Usually, the dataset is separated into two sets of videos, "train" and "test". Each set is represented by a list of video tuples. One combination of "train" and "test" set forms a "split", which is again a tuple of the two corresponding lists.
A dataset can provide multiple splits. For example, the UCF101 and HMDB51 datasets both have 3 standard splits to conduct cross-validation in reporting system performance. Thus the parsers for these datasets return a list of 3 splits.
Here is an abstract way of illustrating the data structure
[ # dataset XX
(
[(filename_1, label_1), (filename_2, label_2),...], # train subset
[(filename_10, label_10), (filename_20, label_20])...] # test subset
), #split 1
(
[(filename_3, label_3), (filename_7, label_7),...], # train subset
[(filename_10, label_10), (filename_15, label_15])...] # test subset
) #split 2
]
Given the above data structure, the task is to write a parser. The examples can be seen in https://github.com/yjxiong/temporal-segment-networks/blob/master/pyActionRecog/benchmark_db.py#L64 and https://github.com/yjxiong/temporal-segment-networks/blob/master/pyActionRecog/benchmark_db.py#L82 .
In general, any function that translates your dataset annotation files to the above data structure will work in the TSN codebase.
With a well-written parser the conform the above requirements, now is time to make the framework know it. For this, we can add the parse function to a dict of parsers, its key will the "name" of the dataset use by the framework, such as ucf101
and hmdb51
. See here for how to do it.
https://github.com/yjxiong/temporal-segment-networks/blob/master/pyActionRecog/__init__.py#L5
The training of TSN models relies on a set of file lists. Once we added the parser to the framework, we can use it to generate the file lists. The command for generating the list files is as simple as
bash scripts/build_file_list.sh DATASET_NAME FRAME_PATH
where the DATASET_NAME
is what we used to register the parser in step 2.
Of course one has to extract the frames and optical flow images before this (see here).
With all the steps above completed, we can use the custom dataset just like how we deal with the provided one (UCF101 and HMDB51). The training and testing are all the same.
For example, ActivityNet v1.2 and v1.3 datasets both have 2 "splits". The first is to train on training
subset and test on validation
subset. The second is to train on training
+validation
subset and test on testing
subset. One can submit the test results of the second splits to the official test server to get the performance metrics. We have implemented an example parser for ActivityNet dataset at
https://github.com/yjxiong/temporal-segment-networks/blob/master/pyActionRecog/benchmark_db.py#L117