Repo for activity net challenge 2019: Task 3 (Dense-Captioning Events in Videos) This repository provides a dense video captioning module for ActivityNet Captions Dataset.
TO-DO:
- complete script for downloading ActivityNet videos
- complete script for converting .mp4 videos to .jpg frames
- write dataset class for ActivityNet Captions dataset
- write baseline model for training
- add optional training
- add evaluation
- add spatiotemporal attention
- add proposal generation code
- add testing code
- add Transformer training
- add BERT training
- add character level training
- Python>=3.6
- numpy
- matplotlib
- Pillow
- accimage (optional, faster than Pillow)
- pytorch>=1.0
- torchvision>=0.2
- pytube
- torchtext (for spacy tokenizer and vocabulary)
- nlg-eval (for evaluation metrics)
- mkl-service (for theano, evaluation)
- Download json file for ActivityNet dataset from here
- Modify
download.sh
and fix the command line argument for root directory to save the dataset. This path will be denoted$root_path
. - Make sure you have at least 300GB on your storage.
- Run
bash download.sh
to download .mp4 files. - Download json files for ActivityNet Captions dataset from here
- Extract downloaded files to
$root_path
- Run
python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/train.json -o ${save_path}
- Run
python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/val_1.json -o ${save_path}
- Run
python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/val_2.json -o ${save_path}
- Make sure you have at least 1TB and enough Inodes left on your storage.
- Run
python utils/mp42jpg.py ${video_dir} ${root_path}/frames activitynet --n_jobs=${number_of_workers}
- Run
train.py
with configurations (script is intrain/trainscript.sh
)
- Proposal Generation is not implemented yet, so prepare a json file with proposals.
- Run
test.py
with configurations (script is ineval/eval.sh
)