Guided Synthesis #252

Patchethium · 2021-12-28T13:30:34Z

I use scipy to resample audios in order to match Julius' 16khz requirement, but sometimes it fails in testing while the one Audacity resampled from the same file works, this is pretty confusing.
The julius4segment keeps throwing out exceptions even with a right sample rate
Most of the results are pretty weird and low-quality like this failed example (audio included):

guided_bad.mp4

Don't know what's going on here...
4. I'm using a simple min-max to normalize the f0 extracted, I guess there should be some better methods...

Currently I only got the second method I mentioned in #231 implemented, hopefully the first one may perform better. Before I get that done, I'll keep this PR a WIP.

Issue

#231

Hiroshiba · 2021-12-28T16:00:09Z

すごい成果だと思います！！！！！！
あなたの熱意に答えて、思いつく限りの助言をしてみます。

まず、juliusがエラーを出すことに関して。
もしかしたら、juliusがデフォルトで無音を無視しているせいかもしれません。
-nostripを指定して、無音も無視しないように設定すれば、エラーが出なくなるかもしれません。
たぶんこちらのjulius4segを使われていると思うのですが、いろいろ改造を加えた僕のjulius4segを利用すると、いろんな問題が解決するかもです。
https://github.com/Hiroshiba/julius4seg

品質が低いのは、おそらく音高の抽出方法がVOICEVOXコアの想定と違うためです。
VOICEVOXではminmax&normalizeではなく、対数を取得しています。
https://github.com/Hiroshiba/acoustic_feature_extractor/blob/478f730c1cd5b24015c73872c2186d123be1b3bc/acoustic_feature_extractor/data/f0.py#L60-L62

Have a great year!

Patchethium · 2021-12-30T11:24:37Z

あけましておめでとうございます！

After changing the normalization algorithm, it turns out to be better than I thought, check out this example:

example1.mp4

And I also got the accent phrases part implemented, which also provides a pretty decent result:

usage.mp4

example2.mp4

As for the Forced Alignment part, unfortunately, switching it to your fork doesn't seem to help much 😢. Julius still throws exceptions kind of frequently, but I'm starting to consider it as acceptable since it's just how unreliable its ASR is. I added a simple error handling to tell the user to change their audio file when Julius crashes, guess it's enough in practice until someone kind enough to improve this part comes 🙏

As a result, I'm marking this PR as ready to be reviewed, feel free to bring up any questions.

Hiroshiba · 2021-12-30T14:57:26Z

ふむ、さすがに変更箇所が大きくて大変ですね。

juilus4segを別ライブラリとして切り出すことはできそうでしょうか。
コードはHiroshiba/julius4segからほとんど変えていませんか？
であれば、VOICEVOX/julius4segリポジトリにforkしてからvoicevox_engineで利用するというのはどうでしょう。

Hmm, the changes are indeed very large and hard.

Is it possible to extract juilus4seg as an independent library?
Is the code almost unchanged from Hiroshiba/julius4seg?
If so, how about forking it into the VOICEVOX/julius4seg repository and then using it in voicevox_engine?

Patchethium · 2021-12-31T01:28:53Z

Oh again? Buhhhh...

No, I don't think I'm capable to do that, neither to make it a GitHub submodule nor creating a python module that can be downloaded and installed by pip. Excluding fastapi's Form() from flake8 took me almost one hour before I gave up, I just don't wanna jump into another rabbit hole to find myself end up spending a bunch of hours in figuring out that tons of configurations.

The julius4seg folder has literally NOTHING changed since I copied it from the original repository, you can simply ignore it in reviewing, making the workload a half. If you want me to divide the two APIs (guided synthesis and guided accent phrases) into separate pull requests, can do in five minutes.

voicevox_engine/guided.py

Patchethium · 2022-01-08T12:30:46Z

It's been a week, how's it going now?

takana-v · 2022-01-08T16:01:25Z

run.py

+        except ParseKanaError:
+            print(traceback.format_exc())
+            raise HTTPException(
+                status_code=500,


I think using 422 instead of 500 for the status code is better.
ref #91

takana-v · 2022-01-08T16:19:18Z

voicevox_engine/synthesis_engine/synthesis_engine.py

+        self,
+        query: AudioQuery,
+        speaker_id: int,
+        audio_file: Optional[IO],


[QUESTION] Why is the audio_file argument set to Optional?
I think an error will occur if audio_file is None.
https://docs.python.org/3/library/typing.html#typing.Optional

Hiroshiba

No, I don't think I'm capable to do that, neither to make it a GitHub submodule nor creating a python module that can be downloaded and installed by pip.

なるほどです、承知しました。
では、guided synthesis機能は一旦experimental機能ということにしましょう！

voicevox_engine/experimentalのようなディレクトリを作り、guided_extractor.pyとjulius4segをこのディレクトリの中に移動してください。
マージされてから、よりクールになるように修正していきましょう！

Okay, I get it.
So, let's define the guided synthesis feature as an experimental feature for now!

Please create a directory like voicevox_engine/experimental and move guided_extractor.py and julius4seg into this directory.
Once they're merged, we can modify them to be cooler!

Hiroshiba · 2022-01-09T19:14:06Z

voicevox_engine/guided_extractor.py

+def get_normalize_scale(engine, kana: str, f0: np.ndarray, speaker_id: int):
+    f0_avg = _no_nan(np.average(f0[f0 != 0]))
+    predicted_phrases, _ = parse_kana(kana, False)
+    engine.replace_mora_data(predicted_phrases, speaker_id=speaker_id)
+    pitch_list = []
+    for phrase in predicted_phrases:
+        for mora in phrase.moras:
+            pitch_list.append(mora.pitch)
+    pitch_list = np.array(pitch_list, dtype=np.float64)
+    predicted_avg = _no_nan(np.average(pitch_list[pitch_list != 0]))
+    return predicted_avg / f0_avg


ターゲット話者のピッチ平均値を実際に一度作成する、というのは面白いアイデアですね！

ここでは、ターゲット話者のピッチ平均値を計算し、インプット話者のピッチをその平均値に合わせるということをしたいのでしょうか。

ピッチを合わせる正確な手法は、平均値をスケールするのではなく、平均値の差の加算です。
なのでここはpredicted_avg - f0_avgを返すようにし、利用側でpitch += diffとするのが正しい計算式になります。
関数名もget_pitch_diffとかにするとよりクールだと思います。

Hiroshiba · 2022-01-09T19:18:48Z

voicevox_engine/synthesis_engine/synthesis_engine.py

+
+    def guided_accent_phrases(
+        self,
+        query: AudioQuery,


この関数はqueryは必要とせず、list[AccentPhrase]で良いと思います。こうすることで呼び出し側はqueryを作成する手間を省略できます。

kanaはlist[AccentPhrase]から作成できます。

voicevox_engine/voicevox_engine/kana_parser.py

Line 137 in bdf712f

def create_kana(accent_phrases: List[AccentPhrase]) -> str:

Hiroshiba · 2022-01-09T19:21:57Z

run.py

@@ -206,6 +207,63 @@ def accent_phrases(
                enable_interrogative=enable_interrogative,
            )

+    @app.post(
+        "/guided_accent_phrase",
+        response_model=AudioQuery,


ここはList[AccentPhrase]が正しそうです。

voicevox_engine/run.py

Line 161 in bdf712f

response_model=List[AccentPhrase],

Hiroshiba · 2022-01-09T19:23:58Z

run.py

+    def guided_accent_phrase(
+        kana: str = Form(...),  # noqa: B008
+        speaker_id: int = Form(...),  # noqa: B008
+        normalize: int = Form(...),  # noqa: B008
+        audio_file: UploadFile = File(...),  # noqa: B008
+    ):


kanaは「ひらがな」の意味ではなく、「AquesTalk記法のテキスト」という意味で用いています。

他のAPIと形式を合わせておくと、ユーザーにとって使い勝手が良さそうです。
こちらとAPI形式を合わせて、このようにしてください。

def guided_accent_phrase( text: str, speaker: int, is_kana: bool = False, enable_interrogative: bool = enable_interrogative_query_param(), # noqa B008, audio_file: UploadFile = File(...), # noqa: B008 ):

coveralls · 2022-01-09T19:41:12Z

Pull Request Test Coverage Report for Build 1641048449

82 of 622 (13.18%) changed or added relevant lines in 6 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-32.4%) to 54.282%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
voicevox_engine/dev/synthesis_engine/mock.py	4	6	66.67%
voicevox_engine/synthesis_engine/synthesis_engine_base.py	6	8	75.0%
voicevox_engine/synthesis_engine/synthesis_engine.py	6	69	8.7%
voicevox_engine/guided_extractor.py	36	125	28.8%
voicevox_engine/julius4seg/sp_inserter.py	27	116	23.28%
voicevox_engine/julius4seg/converter.py	3	298	1.01%

Totals
Change from base Build 1640294394:	-32.4%
Covered Lines:	767
Relevant Lines:	1413

💛 - Coveralls

github-actions · 2022-01-09T19:43:00Z

Coverage Result

Resultを開く

Name	Stmts	Miss
voicevox_engine/init.py	1	0
voicevox_engine/acoustic_feature_extractor.py	75	0
voicevox_engine/dev/synthesis_engine/init.py	2	0
voicevox_engine/dev/synthesis_engine/mock.py	41	4
voicevox_engine/experimental/init.py	0	0
voicevox_engine/experimental/guided_extractor.py	124	89
voicevox_engine/experimental/julius4seg/init.py	0	0
voicevox_engine/experimental/julius4seg/converter.py	298	295
voicevox_engine/experimental/julius4seg/sp_inserter.py	116	89
voicevox_engine/full_context_label.py	162	3
voicevox_engine/kana_parser.py	86	1
voicevox_engine/model.py	136	7
voicevox_engine/mora_list.py	4	0
voicevox_engine/part_of_speech_data.py	5	0
voicevox_engine/preset/Preset.py	12	0
voicevox_engine/preset/PresetLoader.py	34	1
voicevox_engine/preset/init.py	3	0
voicevox_engine/synthesis_engine/init.py	5	0
voicevox_engine/synthesis_engine/core_wrapper.py	156	126
voicevox_engine/synthesis_engine/make_synthesis_engines.py	52	43
voicevox_engine/synthesis_engine/synthesis_engine.py	179	66
voicevox_engine/synthesis_engine/synthesis_engine_base.py	69	9
voicevox_engine/user_dict.py	88	10
voicevox_engine/utility/init.py	3	0
voicevox_engine/utility/connect_base64_waves.py	35	3
voicevox_engine/utility/engine_root.py	9	2
TOTAL	1695	748

# Conflicts: # .gitignore # voicevox_engine/dev/synthesis_engine/mock.py

# Conflicts: # run.py # voicevox_engine/dev/synthesis_engine/mock.py # voicevox_engine/synthesis_engine/synthesis_engine.py # voicevox_engine/synthesis_engine/synthesis_engine_base.py

Patchethium · 2022-02-21T09:42:13Z

Should be okay now.

Hiroshiba

すみません、ちょっと今週立て込んでいるので来週まで待って頂ければ･･･！
（新しいキャラクターが４人増えます）

Sorry, I'm a little busy this week, so please wait until next week...
Four new characters will be added. :->

Patchethium · 2022-02-22T04:23:17Z

Okay, I'll be dealing with the GUI these days.

Four new characters will be added. :->

That's good, but I'm a bit concerned with the speed characters are joining in, TTS with characters itself is a niche market and too much products flooding in may destroy the balance in which customers take the time to accept a new character... Just a thought.

Hiroshiba · 2022-03-03T15:54:53Z

@Patchethium san
Sorry for the delay.
I'm hoping to take a closer look this weekend, just a little longer ...!
おまたせしてしまってすみません。
この週末にじっくり見てみたいと思っています、もう少しだけお待ちください･･･！

TTS with characters itself is a niche market and too much products flooding in may destroy the balance in which customers take the time to accept a new character... Just a thought.

I see...
If I begin to feel saturated with the number of character types, I will review the policy. :->
なるほど･･･。
キャラクター種類数の飽和を感じ始めたら、方針を見直そうと思います！

voicevox_engine/experimental/guided_extractor.py

voicevox_engine/synthesis_engine/synthesis_engine.py

run.py

Hiroshiba · 2022-03-06T14:50:05Z

お待たせしました、レビューしてみました！
Here you go, I reviewed it!

Patchethium · 2022-03-09T03:36:35Z

Should be okay now

Hiroshiba

LGTM！！！
READMEもありがとうございます！！
GUIの実装、とても楽しみにしています！

Patchethium added 3 commits December 28, 2021 14:04

forced alignment, f0 extraction and entry point

a73892b

Merge branch 'master' into guided_synthesis

28cf7c2

kind of finished

a060398

Patchethium marked this pull request as draft December 28, 2021 13:31

Patchethium added 8 commits December 29, 2021 13:04

change julius4seg, doesn't seem to help

f7a3713

run pysen format

6b0651f

add speaker id to api

f1a663a

run pysen format

668df80

add accent_phrase api, finish

ad4bdbd

add request parameter

ea95405

improve error handling

6dff2ec

run pysen format

34eec39

Patchethium marked this pull request as ready for review December 30, 2021 11:24

Patchethium changed the title ~~[WIP] Guided synthesis~~ Guided Synthesis Dec 30, 2021

Patchethium added 3 commits December 30, 2021 22:18

add parameters

a0cba4d

run pysen format

90e41e2

a little boundary check

e889207

Yosshi999 reviewed Dec 31, 2021

View reviewed changes

voicevox_engine/guided.py Outdated Show resolved Hide resolved

Patchethium added 8 commits December 31, 2021 20:44

add normalization for different WAV format

c98c8be

run format

1c6d96e

run format

2d74993

Merge branch 'master' into guided_synthesis

ca356df

move synthesis and accent phrase to synthesis engine

f088176

add test for mock

cf18c3c

change url for apis

98d387c

simplify

48b629f

Patchethium mentioned this pull request Jan 1, 2022

Singing Synthesis with NEUTRINO #260

Closed

takana-v reviewed Jan 8, 2022

View reviewed changes

Hiroshiba requested changes Jan 9, 2022

View reviewed changes

Patchethium added 7 commits January 11, 2022 09:27

error type

061483c

Merge branch 'master' into guided_synthesis

fc45886

# Conflicts: # .gitignore # voicevox_engine/dev/synthesis_engine/mock.py

do something

0e26bbb

do something

365ed92

run format

29427d9

Merge branch 'master' into guided_synthesis

ddc6537

# Conflicts: # run.py # voicevox_engine/dev/synthesis_engine/mock.py # voicevox_engine/synthesis_engine/synthesis_engine.py # voicevox_engine/synthesis_engine/synthesis_engine_base.py

resolve conflict

ca6df3b

Yosshi999 requested a review from Hiroshiba February 21, 2022 14:12

Hiroshiba reviewed Feb 21, 2022

View reviewed changes

Patchethium added 2 commits February 22, 2022 14:33

add usage to README

730917f

Merge branch 'master' into guided_synthesis

3522370

takana-v requested a review from Hiroshiba March 3, 2022 07:38

Hiroshiba requested changes Mar 6, 2022

View reviewed changes

voicevox_engine/experimental/guided_extractor.py Outdated Show resolved Hide resolved

voicevox_engine/synthesis_engine/synthesis_engine.py Show resolved Hide resolved

run.py Show resolved Hide resolved

add comments and experimental flag for guided api

9b75c6c

Hiroshiba approved these changes Mar 10, 2022

View reviewed changes

Hiroshiba merged commit 65e657e into VOICEVOX:master Mar 10, 2022

This was referenced Mar 21, 2022

Guided synthesis - API Improvement #376

Merged

Guided Synthesis VOICEVOX/voicevox#769

Closed

Patchethium mentioned this pull request Aug 13, 2023

Guided #723

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guided Synthesis #252

Guided Synthesis #252

Patchethium commented Dec 28, 2021 •

edited

Loading

Hiroshiba commented Dec 28, 2021

Patchethium commented Dec 30, 2021

Hiroshiba commented Dec 30, 2021

Patchethium commented Dec 31, 2021

Patchethium commented Jan 8, 2022 •

edited

Loading

takana-v Jan 8, 2022 •

edited

Loading

takana-v Jan 8, 2022

Hiroshiba left a comment

Hiroshiba Jan 9, 2022 •

edited

Loading

Hiroshiba Jan 9, 2022 •

edited

Loading

Hiroshiba Jan 9, 2022

Hiroshiba Jan 9, 2022

coveralls commented Jan 9, 2022

github-actions bot commented Jan 9, 2022 •

edited

Loading

Patchethium commented Feb 21, 2022

Hiroshiba left a comment

Patchethium commented Feb 22, 2022

Hiroshiba commented Mar 3, 2022

Hiroshiba commented Mar 6, 2022

Patchethium commented Mar 9, 2022

Hiroshiba left a comment •

edited

Loading

Guided Synthesis #252

Guided Synthesis #252

Conversation

Patchethium commented Dec 28, 2021 • edited Loading

Contents

Issue

Hiroshiba commented Dec 28, 2021

Patchethium commented Dec 30, 2021

Hiroshiba commented Dec 30, 2021

Patchethium commented Dec 31, 2021

Patchethium commented Jan 8, 2022 • edited Loading

takana-v Jan 8, 2022 • edited Loading

Choose a reason for hiding this comment

takana-v Jan 8, 2022

Choose a reason for hiding this comment

Hiroshiba left a comment

Choose a reason for hiding this comment

Hiroshiba Jan 9, 2022 • edited Loading

Choose a reason for hiding this comment

Hiroshiba Jan 9, 2022 • edited Loading

Choose a reason for hiding this comment

Hiroshiba Jan 9, 2022

Choose a reason for hiding this comment

Hiroshiba Jan 9, 2022

Choose a reason for hiding this comment

coveralls commented Jan 9, 2022

Pull Request Test Coverage Report for Build 1641048449

💛 - Coveralls

github-actions bot commented Jan 9, 2022 • edited Loading

Coverage Result

Patchethium commented Feb 21, 2022

Hiroshiba left a comment

Choose a reason for hiding this comment

Patchethium commented Feb 22, 2022

Hiroshiba commented Mar 3, 2022

Hiroshiba commented Mar 6, 2022

Patchethium commented Mar 9, 2022

Hiroshiba left a comment • edited Loading

Choose a reason for hiding this comment

Patchethium commented Dec 28, 2021 •

edited

Loading

Patchethium commented Jan 8, 2022 •

edited

Loading

takana-v Jan 8, 2022 •

edited

Loading

Hiroshiba Jan 9, 2022 •

edited

Loading

Hiroshiba Jan 9, 2022 •

edited

Loading

github-actions bot commented Jan 9, 2022 •

edited

Loading

Hiroshiba left a comment •

edited

Loading