Skip to content

AdvSV stands as the first dataset developed specifically for evaluating Speaker Verification (SV) systems against adversarial attacks. It aims to benchmark the robustness of ASV models in the face of such attacks and offers vital resources for researchers to explore the characteristics of adversarial and replay attacks in this domain.

Notifications You must be signed in to change notification settings

AdvSV/AdvSV.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

overtheair

The generation of AdVSV dataset is divided into two steps, adversarial attack and over-the-air attack.

  • The adversarial attack is digital in level, specifying the victim's automatic speaker verification model as well as the Attacker (adversarial attack algorithm) to generate the adversarial samples.
  • Adversarial samples are replayed after an over-the-air attack (replay-recording) to get replay samples.

You can listen to some demos on the demo page and check out the paper.

Abstract

It is known that deep neural networks are vulnerable to adversarial attacks. Although Automatic Speaker Verification (ASV) built on top of deep neural networks exhibits robust performance in controlled scenarios, many studies confirm that ASV is vulnerable to adversarial attacks. The lack of a standard dataset is a bottleneck for further research, especially reproducible research. In this study, we developed an open-source adversarial attack dataset for speaker verification research. As an initial step, we focused on the over-the-air attack. An over-the-air adversarial attack involves a perturbation generation algorithm, a loudspeaker, a microphone, and an acoustic environment. The variations in the recording configurations make it very challenging to reproduce previous research. The AdvSV dataset is constructed using the Voxceleb1 Verification test set as its foundation. This dataset employs representative ASV models subjected to adversarial attacks and records adversarial samples to simulate over-the-air attack settings. The scope of the dataset can be easily extended to include more types of adversarial attacks. The dataset will be released to the public under the CC BY-SA 4.0 license. In addition, we also provide a detection baseline for reproducible research.

Key Info

Utterances Hours Adversarial Victim Models Adversarial Attack Methods Replay Devices Record Devices
387,160 894 4 2 3 3

Adversarial Victim Models

Victim Model Implement Detail Reference
ECAPA ECAPATDNN paper
RawNet RawNet3 paper
ResNet ResNetSE34V2 paper
XVec XVector paper

Download

Please fill in the form. We'll promptly review and respond. Thank you for your support.

Tag file

Confrontation samples and over-the-air samples were recorded with the AdvSV_tag.txt. Each record has five attributes

File_path, Attack method, Victim ASV Model, Replay Device, Recording Device

Examples are shown in the table below.

File Path Attack Method Victim ASV Model Replay Device Recording Device
Adv/Ensemble_PGD/ResNet-ECAPA-RawNet_eps-0.008_alpha-0.0004_steps-20/id10270-5r0dWxy17C8-00001_id10270-8jEAjG6SegY-00012.wav Ensemble_PGD ResNet-ECAPA-RawNet NA NA
Adv/PGD/ECAPA_eps-0.008_alpha-0.0004_steps-20/id10309-e-IdJ8a4gy4-00005_id10292-aVmHBUeThTQ-00001.wav PGD ECAPA NA NA
OverTheAir/Low/AndroidHigh/Ensemble_PGD/XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20/id10292-gm6PJowclv0-00009_id10273-8cfyJEV7hP8-00019.wav Ensemble_PGD XVec-ResNet-ECAPA Low AndroidHigh
OverTheAir/Low/AndroidHigh/PGD/XVec_eps-0.008_alpha-0.0004_steps-20/id10307-120gjdqGWNQ-00004_id10292-3kzw8lTcUBU-00015.wav PGD XVec Low AndroidHigh

Generation of file names

The file name consists of a enrollment sample and a evaluation sample, for example id10307-120gjdqGWNQ-00004_id10292-3kzw8lTcUBU-00015.wav, id10307/120gjdqGWNQ/00004.wav (A) is the enrollment sample, id10292/ 3kzw8lTcUBU/00015.wav (B) is the evaluation sample, and the two are different speaker voices (id10307 vs. id10292). The adversarial attack B makes the ASV model think that A and B are the same speaker.

Folder hierarchy

The folder hierarchy is shown below.

  1. Divide adversarial attack and over the air into two folders: Adv and OverTheAir.
  2. Adv: Divided into PGD and Ensemble_PGD, identifying the attacked speaker verification model as well as the PGD parameters.
  3. OverTheAir: Identify the replay device by High, Low, Medium and the recording device by AndroidHigh, AndroidLow, iOS.
  4. Note that we also provide replay samples that have not been subjected to adversarial attacks, stored in the Raw folder.
|-- Adv
|   |-- PGD
|   |    |-- ECAPA_eps-0.008_alpha-0.0004_steps-20
|   |    |-- RawNet_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec_eps-0.008_alpha-0.0004_steps-20
|   |    |-- ResNet_eps-0.008_alpha-0.0004_steps-20
|   |-- Ensemble_PGD
|   |    |-- ResNet-ECAPA-RawNet_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec-ECAPA-RawNet_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec-ResNet-RawNet_eps-0.008_alpha-0.0004_steps-20
|-- OverTheAir
|   |-- High
|   |   |-- AndroidHigh
|   |   |   |-- Raw
|   |   |   |    |-- id00012
|   |   |   |    |-- ...
|   |   |   |-- PGD
|   |   |   |    |-- ...
|   |   |   |-- Ensemble_PGD
|   |   |   |    |-- ...
|   |   |-- AndroidLow
|   |   |   |-- ...
|   |   |-- iOS
|   |   |   |-- ...
|   |-- Low
|   |   |-- ...
|   |-- Medium
|   |   |-- ...

Official Data Split

If you want to follow this data split, please download the VoxCeleb1 dataset first.

The bonafide and spoof samples were recorded in bonafide.txt and spoof.txt, respectively. We provide splits for the training set, the development set, and the evaluation set.

Examples are shown in the table below.

Bonafide.txt
id10533/gWHHxedxtUA/00005.wav train
id11037/FKV4YA7_-YQ/00006.wav dev
id10030/DSrDNGJrN5U/00002.wav eval
spoof.txt
OverTheAir/Low/iOS/PGD/ResNet_eps-0.008_alpha-0.0004_steps-20/id10283-h87Y8nir1o0-00007_id10300-ize_eiCFEg0-00005.wav train
Adv/Ensemble_PGD/XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20/id10298-hjvQiiG71rM-00026_id10285-uArtiTSTnSU-00015.wav train
OverTheAir/Low/iOS/Ensemble_PGD/XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20/id10292-3kzw8lTcUBU-00005_id10307-IASj5B-pAyM-00002.wav dev
OverTheAir/High/AndroidHigh/PGD/RawNet_eps-0.008_alpha-0.0004_steps-20/id10272-olePnztkm6U-00012_id10292-ENIHEvg_VLM-00015.wav eval

All data in bonafide is derived from VoxCeleb1.

In the spoof data, in order to test the performance of the out-of-domain data, the samples related to the RawNet model(RawNet, ResNet-ECAPA-RawNet, XVec-ECAPA-RawNet, XVec-ResNet-RawNet), the Medium replay device, and the Android High device, all of them are all unknown in the training phase, i.e., they do not appear in the training set as well as in the development set.

The number of dataset splits is shown in the table below.

train dev eval total
spoof 84,976 10,622 291,562 387,160
bonafide 15,351 15,352 122,813 153,516
total 100,327 25,974 414,375 540,676

License

The AdvSV dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International License. This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. Detailed terms can be found on LICENSE. If you have any questions about this, please contact us via E-mail: [email protected] cc [email protected].

Reference

@misc{wang2023advsv,
      title={AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification}, 
      author={Li Wang and Jiaqi Li and Yuhao Luo and Jiahao Zheng and Lei Wang and Hao Li and Ke Xu and Chengfang Fang and Jie Shi and Zhizheng Wu},
      year={2023},
      eprint={2310.05369},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Appendix

A.Sample from Voxceleb1

The base dataset of AdvSV is sampled in the Voxceleb1 verification test set. The Voxceleb1 verification test set has 37,720 data samples, each including enrollment, test sample, and a label (0 for different speakers, 1 for the same speaker).

Due to the considerable burden in subsequent replay recording, downsampling is employed on the dataset (37,720 samples) to ease this challenge. Two data downsampling principles exist.

  1. Original speaker distribution to prevent varied distributions from impacting attack results.
  2. Ensures consistent SV performance between the subset and full dataset.

To implement these principles, the approach concatenates enrollment and test speaker IDs in each data entry, e.g. 0 id10270/XXXX/00001.wav id10284/XXXX/00029.wav becomes id10270-id10284.

Then, each data ID undergoes a random 25% downsampling, preserving a quarter of the data. Notably, when downsampling reaches 0 samples, one sample is retained. Final retained 9,083 samples, which recorded in file veri_test_25.txt. Currently, in order to reduce the burden of replay recording, for the Over-the-air attack, we only record the results of the target attack, i.e., the data pairs labeled as "different speakers", and the goal of the attack is to make the speaker verification model recognize them as the "same speaker".

veri_test_25.txt is a list of downsampled samples. In order to test the EER metric for automatic speaker recognition, we retained samples of the same speaker. Only attack different speaker samples during adversarial attacks.

B. attackResult.txt

Inside the Adv folder, information about the Adversarial sample is recorded in the attackResult.txt file. Each record has six attributes

Enrollment File, Adversarial File, Is Attack Success, Original Label, Cosine Similarity, Average Perturbation

Examples are shown in the table below.

Enrollment File Adversarial File Is Attack Success Original Label Cosine Similarity Average Perturbation
id10270-8jEAjG6SegY-00035 id10270-8jEAjG6SegY-00035_id10270-5r0dWxy17C8-00021 True 1 -0.496655136346817 0.0065907384268939495
id10270-5r0dWxy17C8-00024 id10270-5r0dWxy17C8-00024_id10270-OhfKF8FSq3Y-00005 True 1 -0.5840052366256714 0.006569686811417341

About

AdvSV stands as the first dataset developed specifically for evaluating Speaker Verification (SV) systems against adversarial attacks. It aims to benchmark the robustness of ASV models in the face of such attacks and offers vital resources for researchers to explore the characteristics of adversarial and replay attacks in this domain.

Topics

Resources

Stars

Watchers

Forks