This task contains two main steps
- Loading the video and converting it to its consecutive frames and then calculating displacements using optical flow pre-trained models
- Creating a CNN to predict speed from displacements
Here we firstly load the video and convert it to its frames and save them. This process is done for both train and test separately.
There are 20400 frames for train and 10798 frames for test.
Frame sample
Now that frames are ready, it is time to estimate flow using optical flow estimation models.
The approach we have chosen is RAFT: Recurrent All-Pairs Field Transforms for Optical Flow.
There are two reasons that have led to this choice:
- RAFT was trained on KITTI dataset which includes scenes captured from a car driving through urban environments, making it suitable for our task.
- It is robust to noise and brightness change which was mentioned before.
Official github implementation
Displacements sample
Before creating the model, due to the huge load of data we need to preprocess it and save images as .npy file. Every image has been resized and normalized.
Here we load the dataset and try some famous pre-trained models such as:
- ResNet18
- EfficientNet