Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

转换后音色跟着 source 而不是 target #97

Open
Blakey-Gavin opened this issue Oct 21, 2024 · 10 comments
Open

转换后音色跟着 source 而不是 target #97

Blakey-Gavin opened this issue Oct 21, 2024 · 10 comments

Comments

@Blakey-Gavin
Copy link

您好,我将 ssl model 更换为中文版 wav2vec2 和 hubert,然后进行了 retrain 和 fine-tune,但不管哪种方式,转换出来的结果都是音色和 source 相似而不是 target。

请问可能的原因是什么,我应该怎么解决这个问题?

@zxj329
Copy link

zxj329 commented Oct 21, 2024

我也是,而且我还用了很多数据

@Blakey-Gavin
Copy link
Author

我数据也有七八十小时。你找到原因了吗?我查了很久,不知道问题出在哪儿

@zxj329
Copy link

zxj329 commented Oct 22, 2024

我的数据有几千个小时都不行,还在找

@Blakey-Gavin
Copy link
Author

好的,你要是找到原因了方便告知一下吗?非常感谢!

@zxj329
Copy link

zxj329 commented Oct 22, 2024

你看下你的mel-loss是多少,有没有下降

@Blakey-Gavin
Copy link
Author

整体上看是下降的
image

@zxj329
Copy link

zxj329 commented Oct 22, 2024

我现在在做实验,你的数据是否每个人的声音数目差不多嘛?还是说有些人数据很多

@Blakey-Gavin
Copy link
Author

这个我之前倒是没统计。统计出来如下:
小于 100:non
100-200:10 speakers
200-300:30 speakers
300-400:34 speakers
400-500:119 speakers
500-600:16 speakers
大于 600: non

utterances 数范围:139-506

@zxj329
Copy link

zxj329 commented Oct 23, 2024

你试试每个speaker在数目差不多呢

@Blakey-Gavin
Copy link
Author

嗯嗯,等有时间的吧,现在还需要忙其它事情。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants