Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using the API to read Chinese, the return is nonetype #64

Open
luobendewugong opened this issue Sep 18, 2024 · 13 comments
Open

When using the API to read Chinese, the return is nonetype #64

luobendewugong opened this issue Sep 18, 2024 · 13 comments
Labels
question Further information is requested

Comments

@luobendewugong
Copy link

Hello, I very appreciate your work. I have deployed it using Ubuntu and tried to read Chinese.

I downloaded zh_CN-huayan-medium.onnx and zh_CN-huayan-x_low.onnx from https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/medium and https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/x_low, and placed them in the voices folder.

I downloaded config.json and model.pth from https://hf-mirror.com/coqui/XTTS-v2/tree/main and placed them in the .local\share\tts\tts_models--multilingual--multi-dataset--xtts folder.

After running python speech.py, the following error occurred, and I suspect it is because the text to be read has not been inputed.

Could you kindly help me, thank you!

bug
@matatonic
Copy link
Owner

The error seems to indicate that the tokenizer is missing, so perhaps you missed a file?

Secondly, why are you manually downloading the models? They can auto download as needed. Maybe you have a good reason, I know downloads from huggingface can be blocked in some area. Without a good reason though, you're just making things harder.

@matatonic
Copy link
Owner

for xtts, the folder path should be like this:

openedai-speech/voices/tts$ ls tts_models--multilingual--multi-dataset--xtts/
config.json  vocab.json  hash.md5  model.pth  speakers_xtts.pth

@matatonic matatonic added the question Further information is requested label Sep 20, 2024
@luobendewugong
Copy link
Author

The error seems to indicate that the tokenizer is missing, so perhaps you missed a file?

Secondly, why are you manually downloading the models? They can auto download as needed. Maybe you have a good reason, I know downloads from huggingface can be blocked in some area. Without a good reason though, you're just making things harder.

Thank you very much for your reply, as I am unable to access https://huggingface.co/, I tried adding these at the beginning of speech.py:

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

But it didn't work, for some reason, my other programs can work this way. In the end, there was no other way but to manually download the model. I also tried using docker but still faced the problem of downloading the model.

@luobendewugong
Copy link
Author

for xtts, the folder path should be like this:

openedai-speech/voices/tts$ ls tts_models--multilingual--multi-dataset--xtts/
config.json  vocab.json  hash.md5  model.pth  speakers_xtts.pth

I did indeed not download all the files. I only downloaded config.json and model.pth. Thank you very much for your detailed explanation. I'll try again.

Simultaneously, I have also added two questions:

  1. No matter how I set it, even if I set the model to xtts_v2.0.2, after one use, when performing TTS, the model will revert back to xtts. Is it a problem with my version setting for xtts, should it be set to xtts_v2.0.2 or xtts_v2? Where do I need to make these settings?
  2. What should be the folder path for xtts_v2? Which files should it include?

Thank you very much for your reply!

@luobendewugong
Copy link
Author

for xtts, the folder path should be like this:

openedai-speech/voices/tts$ ls tts_models--multilingual--multi-dataset--xtts/
config.json  vocab.json  hash.md5  model.pth  speakers_xtts.pth

On https://hf-mirror.com/coqui/XTTS-v1/tree/main, it seems that there are no hash.md5 and speakers_xtts.pth files. These two files should not be necessary, right? When problems arise, I have downloaded the other three files and placed them in the file directory.

屏幕截图 2024-09-21 082423

@matatonic
Copy link
Owner

You want coqui Xttsv2

Screenshot_20240920-210952.png

@matatonic
Copy link
Owner

I'll look into why

HF_ENDPOINT=https://hf-mirror.com

doesn't work and see if I can fix it.

@matatonic
Copy link
Owner

Simultaneously, I have also added two questions:

  1. No matter how I set it, even if I set the model to xtts_v2.0.2, after one use, when performing TTS, the model will revert back to xtts. Is it a problem with my version setting for xtts, should it be set to xtts_v2.0.2 or xtts_v2? Where do I need to make these settings?

without setting a version, 'xtts' will use the latest version, which is xtts_v2.0.2.

  1. What should be the folder path for xtts_v2? Which files should it include?

the folder I mentioned at the beginning, sorry I'm on mobile, I can be more detailed if needed.

@luobendewugong
Copy link
Author

Your explanation has helped me a lot, thank you very much! After I put all the files into tts_models--multilingual--multi-dataset--xtts, the previous issues were resolved, but the following problems have arisen:

INFO:     127.0.0.1:46278 - "POST /v1/audio/speech HTTP/1.1" 200 OK
2024/09/21 13:12:36.681500 cmd_run.go:1138: WARNING: cannot start document portal: dial unix /run/user/1000/bus: connect: no such file or directory

Additionally, for some reason, Loading model xtts to cuda is very slow, taking about 5 minutes.

Thank you very much for your reply!

@luobendewugong
Copy link
Author

I reinstall the ffmpeg, and it runs smoothly! Thank you very much! But it seems that it cannot read in a mix of Chinese and English.

For some reason, Loading model xtts to cuda is very slow, taking about 5 minutes.

@matatonic
Copy link
Owner

Try the dev branch, which supports multilingual at the request level. is it a desirable feature to support multilingual at the sentence level?

@matatonic
Copy link
Owner

re: 5 minutes wait, that is odd, which GPU? models are loaded on demand by default.

@zhai-hello
Copy link

Hello, I very appreciate your work. I have deployed it using Ubuntu and tried to read Chinese.

I downloaded zh_CN-huayan-medium.onnx and zh_CN-huayan-x_low.onnx from https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/medium and https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/x_low, and placed them in the voices folder.

I downloaded config.json and model.pth from https://hf-mirror.com/coqui/XTTS-v2/tree/main and placed them in the .local\share\tts\tts_models--multilingual--multi-dataset--xtts folder.

After running python speech.py, the following error occurred, and I suspect it is because the text to be read has not been inputed.

Could you kindly help me, thank you!

bug

老乡,可以麻烦您,帮忙指导一下具体怎么操作嘛?我也是用open webui然后用这个项目转换语音的,但是,根据官方部署,一直弄不了,可以看我一下我的问题。 麻烦了。https://github.com/matatonic/openedai-speech/issues/66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants