PyTorch implementation for the paper "Composed Query-Based Event Retrieval in Video Corpus with Multimodal Episodic Perceptron"
We propose a novel event retrieval framework termed Composed Query-Based Event Retrieval (CQBER), simulating the multi-modal perception ability of humans to improve accuracy in the retrieval process. Specifically, we first construct two CQBER benchmark datasets, namely ActivityNet-CQ and TVR-CQ, which cover TV shows and open-world scenarios,respectively. Additionally, we propose an initial CQBER method, termed Multimodal Episodic Perceptron (MEP),which excavates complete query semantics from both observed static visual cues and various descriptions. Extensive experiments demonstrate that our proposed framework significantly boosts event retrieval accuracy across different existing methods.
Figure 1: Supp. We compare our proposed TVR-CQ and ActivityNet-CQ datasets with original TVR and ActivityNet-Captions datasets in details.Perceptron
Figure 2: An overview of the CQBER framework based on our proposed Multimodal Episodic Perceptron |
Figure 3: Supp. Additional ablation studies regarding the key model components on the TVR-CQ datasetPerceptron
Figure 4: Visualizations of event retrieval results using our MEP method on the TVR-CQ dataset |
Figure 5: Visualizations of episodic perception in composed queries. Here we adopt the attention from the last VLCU layer. |
Figure 6: More visualizations of episodic perception and event retrieval results using our MEP method on the TVR-CQ dataset. |
The codes are modified from ReLoCLNet
- python 3.x with pytorch (
1.7.0
), torchvision, transformers, tensorboard, tqdm, h5py, easydict - cuda, cudnn
If you have Anaconda installed, the conda environment of ReLoCLNet can be built as follows (take python 3.7 as an example):
conda create --name CQBER python=3.7
conda activate CQBER
conda install -c anaconda cudatoolkit cudnn
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
conda install -c anaconda h5py=2.9.0
conda install -c conda-forge transformers tensorboard tqdm easydict
The conda environment of TVRetrieval also works.
- Clone this repository
- Download features
For the features of TVR dataset, please download it from features and extrct it to the features directory:
$ tar -xjf video_feats.tar.bz2 -C features
This link may be useful for you to
directly download Google Drive files using wget
.interested.
- Add project root to
PYTHONPATH
(Note that you need to do this each time you start a new session.)
$ source setup.sh
TVR dataset
# train, refer `method_tvr/scripts/TVR_CQ_train.sh` and `method_tvr/config.py` more details about hyper-parameters
$ bash method_tvr/scripts/TVR_CQ_train.sh tvr video_sub_tef resnet_i3d --exp_id CQBER
# inference
# the model directory placed in method_tvr/results/tvr-video_sub_tef-CQBER-*
# change the MODEL_DIR_NAME as tvr-video_sub_tef-CQBER-*
# SPLIT_NAME: [val | test]
$ bash method_tvr/scripts/inference.sh MODEL_DIR_NAME SPLIT_NAME
- Upload codes for ActivityNet Captions dataset