Code for the paper Knowledge Graph Extraction from Videos.
*Steps to reproduce
- Download logical caption datasets from here. Alternatively, they can be created from scratch by doing the following:
- Download the word2vec vectors, and place at ../data/w2v_vecs.bin
- Run, in order, semantic_parser.py, w2v_wn_links.py and make_new_dset.py. These, respectively, convert the natural language captions to logical captions, link the components of the logical captions to wordnet, and form a new dataset from the linked logical captions (ie format the dataset properly and exclude predicates and individuals appearing fewer than 50 times).
- Prepare video tensors
- Download the video files for MSVD and MSRVTT.
- Preprocess each dataset using preprocess_.py to obtain video tensors of the right shape and match with the correct set of captions using a numerical video id.
- Run the VGG and I3D networks, make_vgg_vecs.py and make_i3d_vecs.py to get feature vectors for the videos.
- Train and validate the model using main.py.