Skip to content

haoheliu/voicefixer_main

Repository files navigation

arXiv Open In Colab PyPI version githubio

2021-11-06: I have just updated the code structure to make it easier to understand. It may have potential bug now. I will do some test training later.

2021-11-01: I will update the code and make it easier to use later.

VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severely degraded speech and historical speech.

Materials

Usage

Environment (Do this at first)

# Download dataset and prepare running environment
git clone https://github.com/haoheliu/voicefixer_main.git
cd voicefixer_main
source init.sh 

VoiceFixer for general speech restoration

Here we take VF_UNet(voicefixer with unet as analysis module) as an example.

  • Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json # you can modify the configuration file to personalize your training

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation

Automatic evaluation and generating .csv file on all testsets.

For example, if you like to evaluate on all testset (default).

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> 

For example, if you just wanna evaluate on GSR testset.

python3 eval_gsr_voicefixer.py  
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --testset  general_speech_restoration \ 
                    --description  general_speech_restoration_eval 

There are generally seven testsets you can pass to --testset:

  • base: all testset
  • clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
  • reverb: testset with reverberate speech
  • general_speech_restoration: testset with speech that contain all kinds of random distortions
  • enhancement: testset with noisy speech
  • speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

And if you would like to evaluate on a small portion of data, e.g. 10 utterance. You can pass the number to --limit_numbers argument.

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --limit_numbers 10 

Evaluation results will be presented in the exp_results folder.

ResUNet for general speech restoration

  • Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation (similar to voicefixer evaluation)
    python3 eval_ssr_unet.py  
                        --config  <path-to-the-config-file> \
                        --ckpt  <path-to-the-checkpoint> \
                        --limit_numbers <int-test-only-on-a-few-utterance> \
                        --testset  <the-testset-you-want-to-use> \ 
                        --description  <describe-this-test>

ResUNet for single task speech restoration

  • Training

    • Denoising
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_denoising.json
    • Dereverberation
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_dereverberation.json
    • Super Resolution
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_super_resolution.json
    • Declipping
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_declipping.json

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation (similar to voicefixer evaluation)
    python3 eval_ssr_unet.py  
                        --config  <path-to-the-config-file> \
                        --ckpt  <path-to-the-checkpoint> \
                        --limit_numbers <int-test-only-on-a-few-utterance> \
                        --testset  <the-testset-you-want-to-use> \ 
                        --description  <describe-this-test>

Citation

 @misc{liu2021voicefixer,   
     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   
     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  
     year={2021},  
     eprint={2109.13731},  
     archivePrefix={arXiv},  
     primaryClass={cs.SD}  
 }

real-life-example real-life-example real-life-example