You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While I was playing with #252 , I found that guided synthesis with a clip from a song also produces a decent result:
neutrino.mp4
The ground truth is from NEUTRINO, which also exports f0 data into a binary file. So I was thinking, with the fact that we can get phoneme alignment from its musicxml, we may be able to produce a kind of singing synthesis feature with more precise data produced by NEUTRINO and get rid of the buggy Julius.
Before I create a WIP PR referring to this issue, anyone kind enough to do this favor for me is welcomed.
Pros
I don't know... NEUTRINO's accessibility is already terrible enough, I doubt anyone will make use of this feature, while 3 of the 5 libraries already has a UTAU. Maybe just to see how far this idea can reach.
Cons
long notes tend to produce more artifact (the last phoneme in the video above)
the decoder_forwarder may not be able to handle the synthesis for an audio with the length of a song (3-5 minutes), we may need to divide them into batches
実現方法
read f0 and musicxml files from NEUTRINO, then resample and send them to decoder_forwarder
The text was updated successfully, but these errors were encountered:
I don't know... NEUTRINO's accessibility is already terrible enough, I doubt anyone will make use of this feature, while 3 of the 5 libraries already has a UTAU. Maybe just to see how far this idea can reach.
The only thing I can say is, I guess some users may happy that is fan of 春日部つむぎ , if this feature is implemented.
See this video
Currently, she can not sing well.
However, I don't know if this feature should be implemented as text to speech software.
However, I don't know if this feature should be implemented as text to speech software.
Right, maybe it should be made into an assistance tool like Kotonosync with VOICEROID, I better create another repository and name it like Zundamonsync or sth.
内容
While I was playing with #252 , I found that guided synthesis with a clip from a song also produces a decent result:
neutrino.mp4
The ground truth is from NEUTRINO, which also exports f0 data into a binary file. So I was thinking, with the fact that we can get phoneme alignment from its musicxml, we may be able to produce a kind of singing synthesis feature with more precise data produced by NEUTRINO
and get rid of the buggy Julius.Before I create a WIP PR referring to this issue, anyone kind enough to do this favor for me is welcomed.
Pros
I don't know... NEUTRINO's accessibility is already terrible enough, I doubt anyone will make use of this feature, while 3 of the 5 libraries already has a UTAU. Maybe just to see how far this idea can reach.
Cons
long notes tend to produce more artifact (the last phoneme in the video above)
the
decoder_forwarder
may not be able to handle the synthesis for an audio with the length of a song (3-5 minutes), we may need to divide them into batches実現方法
read f0 and musicxml files from NEUTRINO, then resample and send them to
decoder_forwarder
The text was updated successfully, but these errors were encountered: