Request for Token.yaml file to perform inference with pre-trained model #8

actuy · 2023-05-06T09:55:33Z

Thank you for sharing this excellent repository with the community.

I have been trying to use the pre-trained model (the SE & LUT one trained on LJ and VCTK, https://drive.google.com/file/d/114z-cSEJHs8DdnIKnEE8pthIME6FprSM/view?usp=sharing) and inference code provided in the repository to synthesize speech from given text with the speaker in VCTK.

However, the inference process requires a Token.yaml file to obtain the vocabulary. According to your instructions, maybe I can generate it by running Pattern_Generate.py, but before that, I have to download the two datasets and then generate the pattern by giving the path of LJ and VCTK, right?
Besides, I'm wondering would it be possible for you to share the Token.yaml file? This would greatly help me and other users use the pre-trained model for inference tasks.

Also, could you please share the speaker mapping of the speaker id in LUT and the speaker_name (eg. p225) in VCTK?

Thank you in advance for your assistance, and I appreciate any help you can provide.

The text was updated successfully, but these errors were encountered:

CODEJIN · 2023-05-07T16:29:15Z

Dear actuy,

It is correct that you need to specify the paths of LJ and VCTK and generate patterns. Originally, for simple inference, a checkpoint file should have been constructed without this step, but I forgot to upload the yaml file. I am so sorry.
Since the implementation of the model was a long time ago, the preprocessed dataset containing related information seems to have been lost. So, I will re-make the Token.yaml file through the code and upload: Token.zip
As for the yaml file for Speaker ID, I guess that each speaker, including LJ, was written in ascending order. Even if it is regenerated, it is not clear if the numbers are matched correctly, so sharing it again would not be very helpful. Sorry.

If you have any further requests or questions, please feel free to let me know. Thank you.

Best regards,

Heejo

actuy · 2023-05-10T16:27:28Z

Thank you for your detailed reply! That really helps me a lot!

I've got over this issue, but ISSUE #9 is still unsolved.
Actually, I tried to train the PWGAN with the hyperparameters from the Hyper_Parameters.yaml which is found in the pre-trained ckpt as the following:

Sound:
    Spectrogram_Dim: 1025
    Mel_Dim: 80
    Frame_Length: 1024
    Frame_Shift: 256
    Sample_Rate: 24000
    Mel_F_Min: 125
    Mel_F_Max: 7600
    Max_Abs_Mel: 4
    Confidence_Threshold: 0.6
    Gaussian_Smoothing_Sigma: 0.0
    Pitch_Min: 100.0
    Pitch_Max: 500.0

The PWGAN itself works well, but still cannot work for the output mel-spectrograms of Glow_TTS, so I'm really confused about what happened. Not sure if the hyperparameters of sound are still not matched.

I will close this issue and be really appreciate it if you can give me some hints about this vocoder.

actuy closed this as completed May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Token.yaml file to perform inference with pre-trained model #8

Request for Token.yaml file to perform inference with pre-trained model #8

actuy commented May 6, 2023

CODEJIN commented May 7, 2023

actuy commented May 10, 2023

Request for Token.yaml file to perform inference with pre-trained model #8

Request for Token.yaml file to perform inference with pre-trained model #8

Comments

actuy commented May 6, 2023

CODEJIN commented May 7, 2023

actuy commented May 10, 2023