This repository hosts the Pytorch codes for paper Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport (NIPS 2021) by Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu and Yu Tsao.
DOTN performs unsupervised domain adaptation for speech enhancement (SE), using optimal transport (OT) for domain alignment and Wasserstein Generative Adversarial Network (WGAN) to goven the output speech quality.
- Voice Bank corpus (VCTK)
In Data_preprocessing/processing_VCTK_Demand
:
- Download clean_trainset_28spk_wav and clean_testset_wav (two subsets of VCTK) and put them together in a larger folder, e.g.,
VCTK_noisy
. - Use preselected DEMAND noise files in
.../Data_preprocessing/processing_VCTK_Demand/DEMAND
- More noises from DEMAND (16-channel environmental noise recordings) can also be used, with modification required.
- Run
step1_process_noisy_VCTK_16k.py
to generate training and testing dataset: Add paths of VCTK and DEMAND (noise) inVCTK_path
&noise_path
, and select noise types insource_noise
&target_noise
as desired. e.g. source_noise = ["TBUS", "TCAR", "TMETRO"], target_noise = ["SCAFE"]. - Convert generated .wav files to .pt files using
step2_convert_to_pt.py
: Add .wav folder path intarget_root
.
- TIMIT Acoustic-Phonetic Continuous Speech Corpus
In Data_preprocessing/preprocessing_TIMIT
:
- Download TIMIT corpus, and put TIMIT path in
step1_generate_clean_files.py
to generate clean speech - Add path of
noise_types
folder instep2_add_noise.py
to mix clean speech with noise - Convert generated .wav files to .pt files using
step3_convert_to_pt.py
For both cases (VCTK/TIMIT), provide generated data paths data_path & pt_data_path in the corresponding main.py
, and run python main.py
- Python 3.8
- PyTorch 1.8
- POT 0.8.0
- librosa 0.8.1
- pypesq 1.2.4
- pystoi 0.3.3
- Tensorboard 2.7.0
- scikit-learn 1.0.1
- tqdm 4.62.3
- NVIDIA V100 (32 GB CUDA memory) and 4 CPUs.