Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Token.yaml file to perform inference with pre-trained model #8

Closed
actuy opened this issue May 6, 2023 · 2 comments
Closed

Comments

@actuy
Copy link

actuy commented May 6, 2023

Thank you for sharing this excellent repository with the community.

I have been trying to use the pre-trained model (the SE & LUT one trained on LJ and VCTK, https://drive.google.com/file/d/114z-cSEJHs8DdnIKnEE8pthIME6FprSM/view?usp=sharing) and inference code provided in the repository to synthesize speech from given text with the speaker in VCTK.

However, the inference process requires a Token.yaml file to obtain the vocabulary. According to your instructions, maybe I can generate it by running Pattern_Generate.py, but before that, I have to download the two datasets and then generate the pattern by giving the path of LJ and VCTK, right?
Besides, I'm wondering would it be possible for you to share the Token.yaml file? This would greatly help me and other users use the pre-trained model for inference tasks.

Also, could you please share the speaker mapping of the speaker id in LUT and the speaker_name (eg. p225) in VCTK?

Thank you in advance for your assistance, and I appreciate any help you can provide.

@CODEJIN
Copy link
Owner

CODEJIN commented May 7, 2023

Dear actuy,

  1. It is correct that you need to specify the paths of LJ and VCTK and generate patterns. Originally, for simple inference, a checkpoint file should have been constructed without this step, but I forgot to upload the yaml file. I am so sorry.

  2. Since the implementation of the model was a long time ago, the preprocessed dataset containing related information seems to have been lost. So, I will re-make the Token.yaml file through the code and upload: Token.zip

  3. As for the yaml file for Speaker ID, I guess that each speaker, including LJ, was written in ascending order. Even if it is regenerated, it is not clear if the numbers are matched correctly, so sharing it again would not be very helpful. Sorry.

If you have any further requests or questions, please feel free to let me know. Thank you.

Best regards,

Heejo

@actuy
Copy link
Author

actuy commented May 10, 2023

Thank you for your detailed reply! That really helps me a lot!

I've got over this issue, but ISSUE #9 is still unsolved.
Actually, I tried to train the PWGAN with the hyperparameters from the Hyper_Parameters.yaml which is found in the pre-trained ckpt as the following:

Sound:
    Spectrogram_Dim: 1025
    Mel_Dim: 80
    Frame_Length: 1024
    Frame_Shift: 256
    Sample_Rate: 24000
    Mel_F_Min: 125
    Mel_F_Max: 7600
    Max_Abs_Mel: 4
    Confidence_Threshold: 0.6
    Gaussian_Smoothing_Sigma: 0.0
    Pitch_Min: 100.0
    Pitch_Max: 500.0

The PWGAN itself works well, but still cannot work for the output mel-spectrograms of Glow_TTS, so I'm really confused about what happened. Not sure if the hyperparameters of sound are still not matched.

I will close this issue and be really appreciate it if you can give me some hints about this vocoder.

@actuy actuy closed this as completed May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants