ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation (Interspeech 2023)

Paper: https://arxiv.org/pdf/2305.18028.pdf

Abstract

There are significant challenges for speaker adaptation in text-to-speech for languages that are not widely spoken or for speakers with accents or dialects that are not well-represented in the training data. To address this issue, we propose the use of the "mixture of adapters" method. This approach involves adding multiple adapters within a backbone-model layer to learn the unique characteristics of different speakers. Our approach outperforms the baseline, with a noticeable improvement of 5% observed in speaker preference tests when using only one minute of data for each new speaker. Moreover, following the adapter paradigm, we fine-tune only the adapter parameters (11% of the total model parameters). This is a significant achievement in parameter-efficient speaker adaptation, and one of the first models of its kind. Overall, our proposed approach offers a promising solution to the speech synthesis techniques, particularly for adapting to speakers from diverse backgrounds.

The MoA module comprises N residual adapters. Every adapter chooses k closest tokens and processes it. The same token can be processed by multiple adapters. The outputs of the adapters are combined}. Additionally, the architecture of the standard residual adapter is illustrated on the right in the same diagram.

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Training

Datasets

The supported datasets are

[LTS100]
VCTK: The CSTR VCTK Corpus includes speech data uttered by 110 English speakers (multi-speaker TTS) with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

Preprocessing

DATASET refers to the names of datasets such as LTS and VCTK in the following documents.

Run

python3 prepare_align.py --dataset DATASET

python3 preprocess.py --dataset DATASET

Training

Train your model with

python3 train.py --dataset DATASET

For vocoder : HiFi-GAN.

Inference

For a multi-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --speaker_id SPEAKER_ID --restore_step RESTORE_STEP --mode single --dataset DATASET

The dictionary of learned speakers can be found at preprocessed_data/DATASET/speakers.json, and the generated utterances will be put in output/result/.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/DATASET/val.txt --restore_step RESTORE_STEP --mode batch --dataset DATASET

to synthesize all utterances in preprocessed_data/DATASET/val.txt.

Acknowledgement

We borrow the code in https://github.com/keonlee9420/Comprehensive-Transformer-TTS repository. We thank the author for open-sourcing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Backup		Backup
audio		audio
config		config
data/VCTK		data/VCTK
deepspeaker		deepspeaker
encoder		encoder
hifigan		hifigan
lexicon		lexicon
model		model
preprocessor		preprocessor
text		text
utils		utils
README.md		README.md
dataset.py		dataset.py
denoise_audio.py		denoise_audio.py
evaluate.py		evaluate.py
moa.png		moa.png
prepare_align.py		prepare_align.py
prepare_data.py		prepare_data.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation (Interspeech 2023)

Abstract

Dependencies

Training

Datasets

Preprocessing

Training

Inference

Batch Inference

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

declare-lab/adapter-mix

Folders and files

Latest commit

History

Repository files navigation

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation (Interspeech 2023)

Abstract

Dependencies

Training

Datasets

Preprocessing

Training

Inference

Batch Inference

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages