Authors: Junyan Wu, Wei Lu (Corresponding author), Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao.
Coarse-to-Fine Proposal Refinement Framework (CFPRF) is designed to predict audio temporal forgery proposals. It contains a frame-level detection network (FDN) in the first stage to learn robust representations for better indicating rough forgery regions and employs a proposal refinement network (PRN) in the second stage to produce fine-grained proposals. PaperLink.
It is recommended that you install Python 3.8 or higher. We followed the installation setup in this project SSL_Anti-spoofing, which is presented as follows:
conda create -n SSL python=3.8 numpy=1.23.5
conda activate SSL
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
--------------install fairseq for XLSR--------------
git clone https://github.com/TakHemlata/SSL_Anti-spoofing.git
cd fairseq-a54021305d6b3c4c5959ac9395135f63202db8f1
pip install --editable ./
We provide checkpoints and corresponding output results, which can be downloaded from GoogleDrive. Put them on this folder:
./checkpoints
├── 1FDN_HAD.pth
├── 1FDN_LAVDF.pth
├── 1FDN_PS.pth
├── 2PRN_HAD.pth
├── 2PRN_LAVDF.pth
├── 2PRN_PS.pth
Evaluating checkpoints for different datasets to get the results:
-
python evaluate_CFPRF.py --eval --dn PS --save_path ./results
-
python evaluate_CFPRF.py --eval --dn HAD --save_path ./results
-
python evaluate_CFPRF.py --eval --dn LAVDF --save_path ./results
If you want to produce results from a saved '.npy' file, then remove --eval
from the above command.
Dataset | EER | AUC | PRE | REC | F1 |
---|---|---|---|---|---|
HAD | 0.08 | 99.96 | 99.98 | 99.92 | 99.95 |
PS | 7.41 | 96.97 | 95.23 | 92.59 | 93.89 |
LAV-DF | 0.82 | 99.89 | 99.95 | 99.18 | 99.56 |
Dataset | [email protected] | [email protected] | [email protected] | mAP | AR@20 |
---|---|---|---|---|---|
HAD | 99.77 | 99.60 | 96.03 | 99.23 | 99.38 |
PS | 66.34 | 55.47 | 40.96 | 55.22 | 66.53 |
LAV-DF | 94.52 | 93.47 | 88.64 | 93.01 | 93.51 |
The first stage is to train the Frame-level Detection Network (FDN):
python train_stage1.py --dn PS --v1 0.25 --v2 0.1 --num_epoch 18 --save
python train_stage1.py --dn LAVDF --v1 0.3 --v2 0.15 --num_epoch 18 --save
python train_stage1.py --dn HAD --v1 0.15 --v2 0.1 --num_epoch 10 --save
The second stage is to train the Proposal Refinemant Network(PRN):
python train_stage2.py --dn PS --num_epoch 50 --save
This repository thanks several open-source projects: PartialSpoof[1], TDL-ADD[2], LAV-DF[3], SSLAS[4].
@article{10003971,
title={The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance},
author={Zhang, Lin and Wang, Xin and Cooper, Erica and Evans, Nicholas and Yamagishi, Junichi},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
year={2023},
volume={31},
number={},
pages={813-825},
doi={10.1109/TASLP.2022.3233236}}
@inproceedings{xie2024efficient,
title={An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection},
author={Xie, Yuankun and Cheng, Haonan and Wang, Yutian and Ye, Long},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={966--970},
year={2024},
organization={IEEE}
}
@inproceedings{cai2022you,
title = {Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization},
author = {Cai, Zhixi and Stefanov, Kalin and Dhall, Abhinav and Hayat, Munawar},
booktitle = {2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)},
year = {2022},
doi = {10.1109/DICTA56598.2022.10034605},
pages = {1--10},
address = {Sydney, Australia},
}
@article{cai2023glitch,
title = {Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization},
author = {Cai, Zhixi and Ghosh, Shreya and Dhall, Abhinav and Gedeon, Tom and Stefanov, Kalin and Hayat, Munawar},
journal = {Computer Vision and Image Understanding},
year = {2023},
volume = {236},
pages = {103818},
issn = {1077-3142},
doi = {10.1016/j.cviu.2023.103818},
}
@inproceedings{tak2022automatic,
title={Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation},
author={Tak, Hemlata and Todisco, Massimiliano and Wang, Xin and Jung, Jee-weon and Yamagishi, Junichi and Evans, Nicholas},
booktitle={The Speaker and Language Recognition Workshop},
year={2022}
}
Kindly cite our work if you find it useful.
@inproceedings{10.1145/3664647.3680585,
author = {Wu, Junyan and Lu, Wei and Luo, Xiangyang and Yang, Rui and Wang, Qian and Cao, Xiaochun},
title = {Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization},
booktitle = {Proceedings of the 32nd ACM International Conference on Multimedia},
pages = {7395–7403},
numpages = {9},
year = {2024},
doi = {10.1145/3664647.3680585},
}