Skip to content

Latest commit

 

History

History
47 lines (37 loc) · 2.13 KB

README.md

File metadata and controls

47 lines (37 loc) · 2.13 KB

Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization.

PyTorch implementation of our TMM 2023 paper:
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization

Data Preparation

We highly appreciate @YapengTian/AVE-ECCV2018 and @Jinxing Zhou/PSP-CVPR2021 for their great work and sharing.

The AVE dataset and the extracted audio and visual features can be downloaded from here.

Other preprocessed files used in this repository can be downloaded from here.

All the required data are listed below, and these files should be placed into the data folder.


audio_feature.h5  visual_feature.h5  audio_feature_noisy.h5 visual_feature_noisy.h5
right_label.h5  prob_label.h5  labels_noisy.h5  mil_labels.h5
train_order.h5  val_order.h5  test_order.h5

Fully supervised setting

  • Train:

CUDA_VISIBLE_DEVICES=0 python fully_supervised_main.py --train

  • Test:

CUDA_VISIBLE_DEVICES=0 python fully_supervised_main.py --trained_model_path ./model/VSCG_fully.pt

Weakly supervised setting

  • Train:

CUDA_VISIBLE_DEVICES=0 python weakly_supervised_main.py --train

  • Test:

CUDA_VISIBLE_DEVICES=0 python weakly_supervised_main.py --trained_model_path ./model/VSCG_weakly.pt

Note: The pre-trained models can be downloaded here and they should be placed into the model folder.

Citation

If our paper is useful for your research, please consider citing it:

@ARTICLE{vscg2023,
  author={Jiang, Yuanyuan and Yin, Jianqin and Dang, Yonghao},
  journal={IEEE Transactions on Multimedia}, 
  title={Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization}, 
  year={2023},
  volume={},
  number={},
  pages={1-11},
  doi={10.1109/TMM.2023.3324498}}