Skip to content

Latest commit

 

History

History
45 lines (40 loc) · 2.13 KB

README.md

File metadata and controls

45 lines (40 loc) · 2.13 KB

MHN

This is the PyTorch Implementation of our paper "Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering". (accepted by IJCAI’22)

alt text

Platform and dependencies

Ubuntu 14.04
Python 3.7
CUDA10.1
CuDNN7.5+
pytorch>=1.7.0

Data Preparation

  • Download the dataset
    MSVD-QA: link
    MSRVTT-QA: link
    TGIF-QA: link
  • Preprocessing
    1. To extract questions or answers Glove Embedding, please ref here.
      Take the action task in TGIF-QA dataset as an example, we have features at the path /QAfeatures: TGIF/word/action/TGIF_action_train_questions.pt TGIF/word/action/TGIF_action_val_questions.pt TGIF/word/action/TGIF_action_test_questions.pt TGIF/word/action/TGIF_action_vocab.json
    2. To extract appearance and motion feature, use the pretrained models here.
      for the action task, we have features at the path /Vfeatures:
      TGIF/SpatialFeatures/tumblr_nd24xaX8d11qkb1azo1_250/Features.pkl (shape is 2^level-1,16,2048)
      TGIF/SpatialFeatures/tumblr_no00ddSlG31t34v14o1_250/Features.pkl
      ...
      TGIF/TemporalFeatures/tumblr_nd24xaX8d11qkb1azo1_250/Features.pkl (shape is 2^level-1,2048)
      TGIF/TemporalFeatures/tumblr_no00ddSlG31t34v14o1_250/Features.pkl
      ...
      In our paper, number of levels is set to 3 by default.

Train and test

The trained models for the action task can be downloaded from here.

Reference

@article{peng2022MHN,
     title={Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering},
     author={Peng Min, Wang Chongyang, Gao Yuan, Shi Yu, Zhou Xiang-Dong},
     journal={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)},
     year={2022}}