关于语音复刻 #118

zhzLuke96 · 2024-07-29T12:13:55Z

zhzLuke96
Jul 29, 2024
Maintainer

UPDATE 241111:
现目前所有模型都支持语音复刻

下面是老文本

目前，用参考音频推理基本已经写完了，ChatTTS和CosyVoice已支持使用参考音频（reference）作为推理prompt

简单测试结果：

ChatTTS 对于平稳的语音（低频）可以做到可识别的复刻，但是夹子音（高频）难以复刻，效果感人
CosyVoice ：（TODO 还没测，应该也能用了）

下面是测试的生成效果：
参考音频:

mona_in.mp4

合成结果:

天气预报显示，今天会有小雨，请大家出门时记得带伞。降温的天气也提醒我们要适时添衣保暖 [lbreak]

mona_out1.mp4

由于 spk 文件不太好操作，所以重写了一个专门用于构建带有 sample audio/reference audio 说话人的页面（webui中）

受限gradio，只支持了单个 audio 上传和编辑
spkv1 文件结构是支持多reference和多emotion切换的（当然还在完善中）

ref issues #113 #111

zhzLuke96 · 2024-07-29T12:19:22Z

zhzLuke96
Jul 29, 2024
Maintainer Author

两个测试用的 spk 文件

上面示例使用的spk:
mona.spkv1.json
badcase: klee夹子音，ChatTTS合成出来效果感人
klee.spkv1.json

0 replies

Charles0225 · 2024-07-31T07:43:30Z

Charles0225
Jul 31, 2024

请问build 之后保存下来的 json 文件是传到tts 的音色（上传）那里吗，我上传之后显示load failed，报错如下
`2024-07-31 07:20:30,297 - modules.webui.webui_utils - ERROR - load spk info failed: '_io.BufferedRandom' object has no attribute 'endswith'
Traceback (most recent call last):
File "/content/ChatTTS-Forge/modules/webui/webui_utils.py", line 250, in tts_generate
spk: TTSSpeaker = TTSSpeaker.from_file(spk_file)
File "/content/ChatTTS-Forge/modules/core/spk/TTSSpeaker.py", line 118, in from_file
if path.endswith(".spkv1.json"):
File "/usr/lib/python3.10/tempfile.py", line 633, in getattr
a = getattr(file, name)
AttributeError: '_io.BufferedRandom' object has no attribute 'endswith'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 641, in wrapper
response = f(*args, **kwargs)
File "/content/ChatTTS-Forge/modules/webui/tts_tab.py", line 21, in tts_generate_with_history
audio = tts_generate(*args, **kwargs)
File "/content/ChatTTS-Forge/modules/webui/webui_utils.py", line 252, in tts_generate
raise gr.Error("Failed to load speaker file")
gradio.exceptions.Error: 'Failed to load speaker file'`

2 replies

Charles0225 Jul 31, 2024

tmpujyoasm7.spkv1.json
生成 json 文件如上

zhzLuke96 Jul 31, 2024
Maintainer Author

tmpujyoasm7.spkv1.json 生成 json 文件如上

看了下估计是 gradio 版本太低导致的，生成的文件采样率不对，gradio的组件行为也不太对
建议先升级 gradio 版本到 4.31 以上然后重试

wangfeng35 · 2024-08-01T10:30:51Z

wangfeng35
Aug 1, 2024

目前启动webui默认是chattts的模型，有启动cosyvoice和fishspeech模型webui的设置么，还是在施工中？

1 reply

zhzLuke96 Aug 2, 2024
Maintainer Author

还在施工中，这两个模型目前只能api调用

tuxiaoseng · 2024-08-10T09:02:34Z

tuxiaoseng
Aug 10, 2024

@zhzLuke96 fishspeech在api使用mona.spkv1.json，声音一阵男一阵女，音色也不对，是还不支持reference audio么

1 reply

zhzLuke96 Aug 10, 2024
Maintainer Author

是的，fishspeech没搞完也没测。 #90 (comment)

主要原因是，之前我测试下来 fishspeech 效果非常感人，总是会生成空音频，并且速度还慢，看官方issues里也有很多反应这个问题的，所以没准备继续搞

现在倒是有新版本，不过不太清楚新的sft有没有解决空音频的问题，估计forge下个版本（0.8）之前会对接上

a213402010 · 2024-08-22T04:15:10Z

a213402010
Aug 22, 2024

请问是上传了音频和reference text 之后就可以直接使用吗？我使用楼上提供的json可以正常生成音频，但是我通过web不能正常提取音频（虽然返回了json，但是使用它生成的音频只有一秒杂音）
tmpdb6hdkgr.spkv1.json

是抽取音色需要另外再下载模型吗，也没有额外的输出提示

4 replies

zhzLuke96 Aug 22, 2024
Maintainer Author

可能是之前加载立体音时候有bug，尝试修复了一下 1047efa ，可以拉新代码重试

要是还有问题，可以试试这几种办法

用ffmpeg或者其他软件编辑音频，转为单声道然后再上传，也可以尝试转码为 wav 格式
builder 这里不依赖任何模型，所以可以尝试用 colab 环境来创建 spk 文件，减少运行环境导致的问题
尝试在 tts 页面里使用 refrence audio 功能，检查是否可以正常推理

a213402010 Aug 22, 2024

感谢感谢！重新拉取之后，能正常的解决并声称种子文件了，效果也非常好！非常感谢您如此快速地回复，效果也令人赞叹！非常棒的项目！

a213402010 Aug 26, 2024

当我使用api来生成音色的时候仍然碰到了这个问题：
1.由于不知道怎么使用api导入音色，我将音色种子放在了./data/speakers目录下
2.api能检索到该音色，但是在使用接口时，如下：
curl -X 'GET'
'http://10.92.143.224:8085/v1/tts?text=%E4%BD%A0%E5%A5%BD%EF%BC%8C%E6%88%91%E6%98%AF%E5%B0%8F%E5%BA%A6&spk=xidada&style=chat&temperature=0.3&top_p=0.5&top_k=20&seed=42&format=mp3&bs=8&thr=100&eos=%5Buv_break%5D&enhance=false&denoise=false&speed=1&pitch=0&volume_gain=0&stream=false&no_cache=false&model=chat-tts'
-H 'accept: /'
遇到了和用webui时同样的问题，只能生成极短的杂音

zhzLuke96 Aug 27, 2024
Maintainer Author

当我使用api来生成音色的时候仍然碰到了这个问题： 1.由于不知道怎么使用api导入音色，我将音色种子放在了./data/speakers目录下 2.api能检索到该音色，但是在使用接口时，如下： curl -X 'GET' 'http://10.92.143.224:8085/v1/tts?text=%E4%BD%A0%E5%A5%BD%EF%BC%8C%E6%88%91%E6%98%AF%E5%B0%8F%E5%BA%A6&spk=xidada&style=chat&temperature=0.3&top_p=0.5&top_k=20&seed=42&format=mp3&bs=8&thr=100&eos=%5Buv_break%5D&enhance=false&denoise=false&speed=1&pitch=0&volume_gain=0&stream=false&no_cache=false&model=chat-tts' -H 'accept: /' 遇到了和用webui时同样的问题，只能生成极短的杂音

估计跟这个 issues 提到的现象有关 2noise/ChatTTS#648

首先要保证 spk 中配置的参考音频和参考文本对应，并且不带有特殊符号。
然后尝试调整 seed 或者温度

如果还有问题，可以开个 issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于语音复刻 #118

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

关于语音复刻 #118

zhzLuke96 Jul 29, 2024 Maintainer

Replies: 5 comments · 8 replies

zhzLuke96 Jul 29, 2024 Maintainer Author

Charles0225 Jul 31, 2024

Charles0225 Jul 31, 2024

zhzLuke96 Jul 31, 2024 Maintainer Author

wangfeng35 Aug 1, 2024

zhzLuke96 Aug 2, 2024 Maintainer Author

tuxiaoseng Aug 10, 2024

zhzLuke96 Aug 10, 2024 Maintainer Author

a213402010 Aug 22, 2024

zhzLuke96 Aug 22, 2024 Maintainer Author

a213402010 Aug 22, 2024

a213402010 Aug 26, 2024

zhzLuke96 Aug 27, 2024 Maintainer Author

zhzLuke96
Jul 29, 2024
Maintainer

Replies: 5 comments 8 replies

zhzLuke96
Jul 29, 2024
Maintainer Author

Charles0225
Jul 31, 2024

zhzLuke96 Jul 31, 2024
Maintainer Author

wangfeng35
Aug 1, 2024

zhzLuke96 Aug 2, 2024
Maintainer Author

tuxiaoseng
Aug 10, 2024

zhzLuke96 Aug 10, 2024
Maintainer Author

a213402010
Aug 22, 2024

zhzLuke96 Aug 22, 2024
Maintainer Author

zhzLuke96 Aug 27, 2024
Maintainer Author