Exploring Adaptive Attention in Memory Transformer Applied to Coherent Video Paragraph Captioning

PyTorch code for our BigMM paper "Exploring adaptive attention in memory transformer applied to coherent video paragraph captioning" Enhanced by Leonardo Vilela Cardoso, Silvio Jamil F. Guimarães and Zenilton K. G. Patrocínio Jr,

A coherent description is an final purpose regarding video captioning via a couple of sentences due to the fact it might also without delay have an effect on consistency and intelligibility. In this context, a paragraph describing a video is affected through the activities used to both produce its certain narrative or furnish some clues that can also assist decrease textual repetition. This work proposes a model named Adaptive Transformer that uses attention mechanisms to enhance a memory-augmented transformer. This new approach increases the coherence among the generated sentences, assessing data importance (about the video segments) contained in the self-attention and using that to improve readability. The obtained results show the potential of this new approach as it provides increased coherence among the various video segments, decreasing the repetition in the generated sentences and improving the description diversity in the activitynet datasets.

Getting started

Prerequisites

Clone this repository

# no need to add --recursive as all dependencies are copied into this repo.
git clone https://github.com/IMScience-PPGINF-PucMinas/Adaptive-Transformer.git
cd Adaptive-Transformer

Prepare feature files

Download features from Google Drive: rt_anet_feat.tar.gz (39GB) and rt_yc2_feat.tar.gz (12GB). These features are repacked from features provided by densecap.

mkdir video_feature && cd video_feature
tar -xf path/to/rt_anet_feat.tar.gz 
tar -xf path/to/rt_yc2_feat.tar.gz

Install dependencies

Python 2.7
PyTorch 1.1.0
nltk
easydict
tqdm
tensorboardX

Add project root to PYTHONPATH

source setup.sh

Note that you need to do this each time you start a new session.

Training and Inference

We give examples on how to perform training and inference with Adaptive-Transformer.

Build Vocabulary

bash scripts/build_vocab.sh

DATASET_NAME can be anet for ActivityNet Captions or yc2 for YouCookII.

Adaptive-Transformer training

The general training command is:

bash scripts/train.sh

To train our Adaptive-Transformer model on ActivityNet Captions:

bash scripts/train.sh anet

Training log and model will be saved at results/anet_re_*.
Once you have a trained model, you can follow the instructions below to generate captions.

Generate captions

bash scripts/translate_greedy.sh anet_re_* val

Replace anet_re_* with your own model directory name. The generated captions are saved at results/anet_re_*/greedy_pred_val.json

Evaluate generated captions

bash scripts/eval.sh anet val results/anet_re_*/greedy_pred_val.json

The results should be comparable with the results we present at Table 2 of the paper. E.g., B@4 10.00; C 23.04 R@4 5.29.

Citations

If you find this code useful for your research, consider cite one of our papers:

@inproceedings{cardoso2022exploring,
  title={Exploring adaptive attention in memory transformer applied to coherent video paragraph captioning},
  author={Cardoso, Leonardo Vilela and Guimaraes, Silvio Jamil F and Patrocinio, Zenilton KG},
  booktitle={2022 IEEE Eighth International Conference on Multimedia Big Data (BigMM)},
  pages={37--44},
  year={2022},
  organization={IEEE}
}

@inproceedings{cardoso2021enhanced,
  title={Enhanced-Memory Transformer for Coherent Paragraph Video Captioning},
  author={Cardoso, Leonardo Vilela and Guimaraes, Silvio Jamil F and Patroc{\'\i}nio, Zenilton KG},
  booktitle={2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)},
  pages={836--840},
  year={2021},
  organization={IEEE}
}

Others

This code used resources from the following projects: emt, mart, transformers, transformer-xl, densecap, OpenNMT-py.

Contact

Leonardo Vilela Cardoso with this e-mail: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
densevid_eval		densevid_eval
scripts		scripts
src		src
README.md		README.md
build_vocab.py		build_vocab.py
setup.sh		setup.sh
test.ipynb		test.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Adaptive Attention in Memory Transformer Applied to Coherent Video Paragraph Captioning

Getting started

Prerequisites

Training and Inference

Citations

Others

Contact

About

Releases

Packages

Languages

IMScience-PPGINF-PucMinas/ADAT

Folders and files

Latest commit

History

Repository files navigation

Exploring Adaptive Attention in Memory Transformer Applied to Coherent Video Paragraph Captioning

Getting started

Prerequisites

Training and Inference

Citations

Others

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages