You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This template is only for question, not feature requests or bug reports.
I have thoroughly reviewed the project documentation and read the related paper(s).
I have searched for existing issues, including closed ones, no similar questions.
I confirm that I am using English to submit this report in order to facilitate communication.
Question details
Hello, I am not in the audio field. I would like to ask, for a reference audio, I have removed BGM and reverberation to a certain extent, but the effect of inputting it into the sound cloning is still not good. Is there any better way to detect whether there is noise, distortion, and multiple people speaking in the reference audio?
The text was updated successfully, but these errors were encountered:
Removing bgm and reverb from an audio will also remove many frequency ranges where the module finds difficult to analyse. So its better use some other dataset which will have only voice. Still whisper can transcribe. But in audio case, its recommended to use raw voice only dataset.
Checks
Question details
Hello, I am not in the audio field. I would like to ask, for a reference audio, I have removed BGM and reverberation to a certain extent, but the effect of inputting it into the sound cloning is still not good. Is there any better way to detect whether there is noise, distortion, and multiple people speaking in the reference audio?
The text was updated successfully, but these errors were encountered: