PyTorch implementation of our TMM 2023 paper:
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
We highly appreciate @YapengTian/AVE-ECCV2018 and @Jinxing Zhou/PSP-CVPR2021 for their great work and sharing.
The AVE dataset and the extracted audio and visual features can be downloaded from here.
Other preprocessed files used in this repository can be downloaded from here.
All the required data are listed below, and these files should be placed into the data
folder.
audio_feature.h5 visual_feature.h5 audio_feature_noisy.h5 visual_feature_noisy.h5
right_label.h5 prob_label.h5 labels_noisy.h5 mil_labels.h5
train_order.h5 val_order.h5 test_order.h5
- Train:
CUDA_VISIBLE_DEVICES=0 python fully_supervised_main.py --train
- Test:
CUDA_VISIBLE_DEVICES=0 python fully_supervised_main.py --trained_model_path ./model/VSCG_fully.pt
- Train:
CUDA_VISIBLE_DEVICES=0 python weakly_supervised_main.py --train
- Test:
CUDA_VISIBLE_DEVICES=0 python weakly_supervised_main.py --trained_model_path ./model/VSCG_weakly.pt
Note: The pre-trained models can be downloaded here and they should be placed into the model
folder.
If our paper is useful for your research, please consider citing it:
@ARTICLE{vscg2023,
author={Jiang, Yuanyuan and Yin, Jianqin and Dang, Yonghao},
journal={IEEE Transactions on Multimedia},
title={Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization},
year={2023},
volume={},
number={},
pages={1-11},
doi={10.1109/TMM.2023.3324498}}