Guided #723

Patchethium · 2023-08-13T15:01:28Z

The continue of #252, use a custom forced aligner snfa instead of julius for the sake of my nerve.
snfa is very young and unstable, I'll keep this API in experiment sub dir and lock with --enable-guided as in #252.

github-actions · 2023-08-13T15:08:48Z

Coverage Result

Resultを開く

Name	Stmts	Miss
run.py	446	296
voicevox_engine/init.py	1	0
voicevox_engine/acoustic_feature_extractor.py	75	0
voicevox_engine/cancellable_engine.py	85	66
voicevox_engine/dev/core/init.py	2	0
voicevox_engine/dev/core/mock.py	27	12
voicevox_engine/dev/synthesis_engine/init.py	2	0
voicevox_engine/dev/synthesis_engine/mock.py	38	3
voicevox_engine/downloadable_library.py	93	5
voicevox_engine/engine_manifest/EngineManifest.py	34	0
voicevox_engine/engine_manifest/EngineManifestLoader.py	12	0
voicevox_engine/engine_manifest/init.py	3	0
voicevox_engine/full_context_label.py	162	3
voicevox_engine/guided/init.py	0	0
voicevox_engine/guided/extractor.py	67	3
voicevox_engine/kana_parser.py	86	1
voicevox_engine/metas/Metas.py	33	0
voicevox_engine/metas/MetasStore.py	29	12
voicevox_engine/metas/init.py	2	0
voicevox_engine/model.py	160	9
voicevox_engine/mora_list.py	4	0
voicevox_engine/morphing.py	70	46
voicevox_engine/part_of_speech_data.py	5	0
voicevox_engine/preset/Preset.py	12	0
voicevox_engine/preset/PresetError.py	2	0
voicevox_engine/preset/PresetManager.py	81	2
voicevox_engine/preset/init.py	4	0
voicevox_engine/setting/Setting.py	11	0
voicevox_engine/setting/SettingLoader.py	19	0
voicevox_engine/setting/init.py	3	0
voicevox_engine/synthesis_engine/init.py	5	0
voicevox_engine/synthesis_engine/core_wrapper.py	201	146
voicevox_engine/synthesis_engine/make_synthesis_engines.py	59	30
voicevox_engine/synthesis_engine/synthesis_engine.py	156	35
voicevox_engine/synthesis_engine/synthesis_engine_base.py	70	10
voicevox_engine/user_dict.py	144	11
voicevox_engine/utility/init.py	5	0
voicevox_engine/utility/connect_base64_waves.py	37	0
voicevox_engine/utility/core_version_utility.py	8	1
voicevox_engine/utility/mutex_utility.py	10	0
voicevox_engine/utility/path_utility.py	26	8
TOTAL	2289	699

Hiroshiba

I understand this is still a draft, but I've taken a look at the pull request!
I don't have any comments regarding the direction at the moment.

voicevox_engine/guided/extractor.py

Patchethium · 2023-08-15T18:04:18Z

Sorry, I forgot to set up the build pipeline.

Patchethium · 2023-08-15T19:51:15Z

Think it's ready for review.

sevenc-nanashi

音声合成周りはあまりわからないのでそれ以外を

run.py

voicevox_engine/synthesis_engine/make_synthesis_engines.py

takana-v

不要だと思われる部分にコメントを付けました。
また、以下のファイルにcv_jp.binのダウンロード処理の追加が必要かと思われます。

Dockerfile
.github/workflows/build.yml

takana-v · 2023-08-18T13:57:11Z

run.py

@@ -1287,6 +1333,7 @@ def custom_openapi():
        runtime_dirs=args.runtime_dir,
        cpu_num_threads=cpu_num_threads,
        enable_mock=args.enable_mock,
+        enable_guided=args.enable_guided,


この引数は使っていないので不要かと思います。

Suggested change

enable_guided=args.enable_guided,

It's actually used in
https://github.com/Patchethium/voicevox_engine/blob/5b78b4d1c61ed9bfb671cb7154a69815e7c1bf00/run.py#L342-L346
in case as snfa is still unstable, users may want to disable it.

分かりにくく申し訳ありませんでした。
「引数」はargs.enable_guidedの事ではなく、voicevox_engine/synthesis_engine/make_synthesis_engines.pyのmake_synthesis_engines関数の引数であるenable_guidedの事です。
関数内でenable_guidedは参照されていないため、削除しても問題ないと思います。

Sorry for the confusion.
"Argument" is not args.enable_guided, but enable_guided which is the argument of make_synthesis_engines function in voicevox_engine/synthesis_engine/make_synthesis_engines.py.
There is no reference to enable_guided in the function, so it should be fine to remove it.

I see, fixed, thanks!

takana-v · 2023-08-18T13:57:50Z

voicevox_engine/synthesis_engine/make_synthesis_engines.py

@@ -15,6 +15,7 @@ def make_synthesis_engines(
    runtime_dirs: Optional[List[Path]] = None,
    cpu_num_threads: Optional[int] = None,
    enable_mock: bool = True,
+    enable_guided: bool = False,


Suggested change

enable_guided: bool = False,

takana-v · 2023-08-18T13:58:14Z

voicevox_engine/synthesis_engine/make_synthesis_engines.py

+    enable_guided: bool, optional, default=False
+        入力音声を解析してAudio Queryで返す機能が有効かどうか


Suggested change

enable_guided: bool, optional, default=False

入力音声を解析してAudio Queryで返す機能が有効かどうか

Co-authored-by: Nanashi. <[email protected]>

Hiroshiba

Thank you for the PR!!

I think it might be best to decide after listening to the actual synthesis results!
(@y-chan, what do you think? 👀 )

I can run it here, but due to time constraints, it might take a while.
I've prepared sample input voices. The filenames are the content of the lines. If you could create a sample with this, it would be very helpful 🙇
hiho_input_data.zip

We can decide later where to place the file and how to set the parameters!

PRありがとうございます！！

実際の合成結果を聞いてから判断するのが良いのかなと思いました！
（ @y-chan どうでしょう 👀 ）

こちらで動かしても良いのですが、時間の都合上かなり遅くなるかもしれません。
サンプルの入力音声をご用意しました。ファイル名がセリフ内容になっています。もしよかったらこれでサンプルを作っていただけるととても助かります 🙇
hiho_input_data.zip

Hiroshiba · 2023-08-22T16:33:16Z

run.spec

@@ -12,6 +12,8 @@ datas = [
    ('presets.yaml', '.'),
    ('default_setting.yml', '.'),
    ('ui_template', 'ui_template'),
+    ('model', 'model'),


不要そう？

I think this code is unnecessary.

Hiroshiba · 2023-08-22T16:54:01Z

run.py

+    parser.add_argument(
+        "--guide_model",
+        type=Path,
+        default="cv_jp.bin",


We can decide later where to place the file and how to set the parameters!

ファイル名をどうするかや、引数をどうするかは後で決めさせていただこうと思います！

Patchethium · 2023-08-24T16:38:39Z

@Hiroshiba I made a quick script to do generate audio samples from the reference audio you provided:

Expand

import requests
import json
import os

guide_url = "http://localhost:50021/guide"
query_url = "http://localhost:50021/audio_query"
synthesis_url = "http://localhost:50021/synthesis"

cur_dir = os.getcwd()
result_dir = os.path.join(cur_dir, "result")
if not os.path.exists(result_dir):
    os.mkdir(result_dir)

speaker = 1
filenames = os.listdir(".")
for filename in filenames:
    if filename[-4:] == ".wav":
        print(f"Processing: {filename}")
        full_guide_url = f"{guide_url}?speaker={speaker}&normalize=true&ref_path={os.path.join(cur_dir, filename)}"
        full_query_url = f"{query_url}?speaker={speaker}&text={filename[:-4]}"
        query = requests.request("POST", full_query_url).text
        headers = {"Content-Type": "application/json", "charset": "utf-8"}
        guided_query = requests.request("POST", full_guide_url, headers=headers, data=query.encode("utf-8")).text
        full_synthesis_url = f"{synthesis_url}?speaker={speaker}"
        audio = requests.request("POST", full_synthesis_url, headers=headers, data=guided_query.encode("utf-8")).content
        with open(os.path.join(result_dir, filename), "wb") as f:
            f.write(audio)

The result is attached as hiho_result.zip. I'd say the quality is not pretty satisfying for the alignment search collapses a several times especially with singing voice. However, it's snfa's problem, which I have the confidence to fix in later upgrades.

Hiroshiba

Thank you for sharing the sample. I was able to intuitively grasp its accuracy!
I agree that there seems to be an issue with the singing.
Separately, I noticed that the control of intonation seems slightly off.
I've commented on possible causes for the observed trend where the pitch tends to go low.
Additionally, there also seems to be a pattern where the voice becomes noticeably quieter, as in the audio clip of "のるしか無いっすよね　このビッグウェーブにね," but I couldn't think of a reason for this.

サンプルの共有ありがとうございます。精度を直感的に感じることができました！
歌に関して問題がありそうなのは同感です。
別で、イントネーションの制御が微妙にできていないのが若干気になりました。
ピッチが低くなる傾向がありそうでしたが、その原因となりそうな候補をコメントに書きました。
他にも「のるしか無いっすよね　このビッグウェーブにね」の音声のように声が全体的になぜか小さくなっているパターンもありそうですが、こっちは原因を思いつくことができませんでした。

voicevox_engine/guided/extractor.py

Patchethium · 2023-08-26T17:13:05Z

@takana-v I modified the Dockerfile and it should be fine, however, I'm not familiar with github workflow i.e. the .github/workflows/build.yml. Some further instruction would be helpful.

@Hiroshiba I don't know the suitable way to test the guide API. I mean, while testing the synthesis engine it only produces some blank audio, which is not suitable for forced alignment. Should I add an addition audio source, like a clip from common voice dataset (public domain), into the repository and align on that for testing?

takana-v

Build run.py with PyInstallerの前に、ダウンロード処理を入れれば大丈夫だと思います。
ダウンロードするファイルをキャッシュするようにするとより良いですが、ひとまずはキャッシュ無しでも良いと思います。

- name: Download model for guided synthesis
  run: curl -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin

takana-v · 2023-08-29T12:23:15Z

voicevox_engine/synthesis_engine/make_synthesis_engines.py

+    enable_guided: bool, optional, default=False
+        入力音声を解析して音声合成クエリで返す機能が有効かどうか


消し忘れかもしれません。

Suggested change

enable_guided: bool, optional, default=False

入力音声を解析して音声合成クエリで返す機能が有効かどうか

Hiroshiba · 2023-08-30T01:47:58Z

@Patchethium
It's great that you're implementing the tests!!
It seems like a good idea to test if the expected AudioQuery is returned when providing a sample voice. The sample voice can be from the public domain, but my voice samples are also okay. For example, the "もしもーし聞こえてますか.wav" seems to have good synthesis quality, so how about adding this audio data to the test directory as an OSS data?

テストを実装するの、良いですね！！
サンプル音声を与えた時、想定通りのAudioQueryが帰ってくることをテストするのが良さそうに感じました。
サンプル音声はパブリックドメインのものでも良いですが、僕の音声サンプルでも大丈夫です。
例えば「もしもーし聞こえてますか.wav」などは合成結果の品質も良さそうなので、この音声データをOSSデータとしてテストディレクトリに放り込むのはどうでしょうか。

use -N to avoid overwrite

Patchethium · 2023-09-02T18:16:28Z

That should be all. I'll finish the GUI in no time based on the old code.

Hiroshiba

I think it's shaping up nicely!
I have a suggestion to improve the maintainability of VOICEVOX.

Hiroshiba · 2023-09-03T15:12:09Z

voicevox_engine/guided/extractor.py

+    # convert to int, to avoid overflow in np.int32 type
+    wav = resample(wav, int(aligner.sr) * int(wav.shape[0]) // src_sr)


This might be a better approach?

Suggested change

# convert to int, to avoid overflow in np.int32 type

wav = resample(wav, int(aligner.sr) * int(wav.shape[0]) // src_sr)

# convert to int, to avoid overflow in np.int32 type

wav = resample(wav, int(aligner.sr * wav.shape[0]) // src_sr)

Hiroshiba · 2023-09-03T15:21:28Z

.github/workflows/build-docker.yml

+      # Download snfa forced aligner model
+      - name: Download model for guided synthesis
+        run: curl -N -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin
+


I have a suggestion that could improve the usability of the library! What do you think about bundling this binary file within the snfa library or adding a feature to automatically download the model file if it's missing?
Bundling is fairly common; for example, the soundfile library includes a DLL. Auto-downloading is also a common feature; for example, pyopenjtalk automatically downloads a missing dictionary
https://github.com/r9y9/pyopenjtalk/blob/22852ba6e36faaf2589b458e731c701e24f9dc9d/pyopenjtalk/__init__.py#L77-L79.

ライブラリの使い勝手が上がりそうな提案があります！
このバイナリファイルをsnfaライブラリの中に同梱したり、あるいはモデルファイルがなかったら自動でダウンロードする機能をつけるのはどうでしょうか？
同梱するのは結構普通のことで、例えばsoundfileなどもdllが同梱されていたと思います。
自動ダウンロードもよくある機能で、例えばpyopenjtalkは辞書がない場合に自動的にダウンロードしています。
https://github.com/r9y9/pyopenjtalk/blob/22852ba6e36faaf2589b458e731c701e24f9dc9d/pyopenjtalk/__init__.py#L77-L79

Hiroshiba · 2023-09-03T15:22:52Z

run.spec

@@ -12,6 +12,8 @@ datas = [
    ('presets.yaml', '.'),
    ('default_setting.yml', '.'),
    ('ui_template', 'ui_template'),
+    ('model', 'model'),


I think this code is unnecessary.

Hiroshiba · 2023-09-03T15:30:26Z

test/test_guided.py

+        extract(
+            wav, sr, query=self.audio_query, model_path="cv_jp.bin"
+        )  # as long as it doesn't throw exceptions it's okay


It might be a good idea to write code to verify whether the output of extract matches the expected results. What do you think? 👀

tarepan · 2024-03-25T11:09:58Z

@Hiroshiba
こちらの PR はどのような状態でしょうか？
PR 概要から関連 issue や機能詳細がわからないため、現時点では引き継ぎも厳しそうに感じます。

Patchethium · 2024-03-25T11:27:36Z

The current state is that I'm not satisfied with the alignment and packaging strategy of the snfa lib, neither do I have time to really sit down and improve it.

So if you think it doesn't look good to be hanging on the pr tab, it's okay to close it and I may reopen it when I eventually release a better version of snfa.

tarepan · 2024-03-25T12:03:02Z

@Patchethium
Thank you for quick response!

I'm not satisfied with the alignment and packaging strategy of the snfa lib, neither do I have time to really sit down and improve it.

I understood the situation👍️
Please feel free to improve this PR at your own pace!

if you think it doesn't look good to be hanging on the pr tab

No problem, you can keep this PR on PR tab👍️
Instead, can you do two things for us?

Convert this PR as draft (because normal PR means Ready-Review)
Write PR detail in initial comment (PULL_REQUEST_TEMPLATE.md will help you)

Patchethium · 2024-03-25T12:11:08Z

Well it was ready for review as I was expecting others to improve it after it's released as an experimental feature but things didn't go as expected.
It's good to keep it as a draft but I prefer to close because my enthusiasm is not as much as the time I created this pr. I may pretty much abandon it.
I'll describe this pr with more detail when I have the time.

install snfa

b874227

basic guided feature

385c9c1

Hiroshiba reviewed Aug 14, 2023

View reviewed changes

voicevox_engine/guided/extractor.py Outdated Show resolved Hide resolved

Patchethium added 3 commits August 16, 2023 01:47

remove kana to phoneme, fix lint

5bf9bf2

scale duration instead of pitch

fc8a7c6

move forced aligner model path to args

6a1eee8

Patchethium marked this pull request as ready for review August 15, 2023 18:02

Patchethium requested a review from a team as a code owner August 15, 2023 18:02

Patchethium requested review from Hiroshiba and removed request for a team August 15, 2023 18:02

guard long vowel

4fc1484

Patchethium marked this pull request as draft August 15, 2023 18:03

Patchethium added 7 commits August 16, 2023 03:05

run lint

5d31351

update poetry deps　and requirements

ab3614f

add cv_jp.bin into bundle

021a679

fix poetry deps

cb49bf4

the last one doesn't work, fix poetry deps again

4432980

fix mock for guide method

29ac4a9

run format

5b78b4d

Patchethium marked this pull request as ready for review August 15, 2023 19:48

sevenc-nanashi reviewed Aug 18, 2023

View reviewed changes

run.py Outdated Show resolved Hide resolved

run.py Outdated Show resolved Hide resolved

voicevox_engine/synthesis_engine/make_synthesis_engines.py Outdated Show resolved Hide resolved

takana-v reviewed Aug 18, 2023

View reviewed changes

Patchethium and others added 5 commits August 20, 2023 22:20

use with in reading reference audio

c86d0d7

Co-authored-by: Nanashi. <[email protected]>

improve help text for arg enable_guided

b5c2d86

Co-authored-by: Nanashi. <[email protected]>

improve comment for function guide

3d3a7c0

Co-authored-by: Nanashi. <[email protected]>

Merge branch 'master' into guided

5f37b16

remove arg enable_guided in make_synthesis_engines

4ec5ef5

Hiroshiba reviewed Aug 22, 2023

View reviewed changes

Hiroshiba reviewed Aug 26, 2023

View reviewed changes

voicevox_engine/guided/extractor.py Outdated Show resolved Hide resolved

Patchethium added 7 commits August 26, 2023 18:21

improve pitch normalization

432ac57

Merge branch 'master' into guided

eb9c638

use to_flatten_moras in normalizing guided pitch

8d452c4

use speaker_id instead of speaker in function guide

3bdd8d3

run format

a69a06a

use curl instead of wget

b2b6a07

download snfa's model in Dockerfile

30db29c

takana-v reviewed Aug 29, 2023

View reviewed changes

Patchethium added 7 commits September 2, 2023 22:53

move stereo2mono to extractor

2e5be32

remove redundant comment

78f7162

download snfa model in ci build workflow

ac0a314

add a simple test for guided

7f15c54

run format and typos

d034b44

add downloading into test and build-docker;

6c4fe0a

use -N to avoid overwrite

prevent overflow in numpy.int32 type

b19783d

Hiroshiba reviewed Sep 3, 2023

View reviewed changes

Patchethium closed this Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guided #723

Guided #723

Patchethium commented Aug 13, 2023

github-actions bot commented Aug 13, 2023 •

edited

Loading

Hiroshiba left a comment

Patchethium commented Aug 15, 2023

Patchethium commented Aug 15, 2023

sevenc-nanashi left a comment

takana-v left a comment

takana-v Aug 18, 2023

Patchethium Aug 20, 2023

takana-v Aug 21, 2023

Patchethium Aug 21, 2023

takana-v Aug 18, 2023

takana-v Aug 18, 2023

Hiroshiba left a comment

Hiroshiba Aug 22, 2023

Hiroshiba Sep 3, 2023

Hiroshiba Aug 22, 2023 •

edited

Loading

Patchethium commented Aug 24, 2023

Hiroshiba left a comment

Patchethium commented Aug 26, 2023

takana-v left a comment •

edited

Loading

takana-v Aug 29, 2023

Hiroshiba commented Aug 30, 2023

Patchethium commented Sep 2, 2023

Hiroshiba left a comment

Hiroshiba Sep 3, 2023

Hiroshiba Sep 3, 2023

Hiroshiba Sep 3, 2023

Hiroshiba Sep 3, 2023 •

edited

Loading

tarepan commented Mar 25, 2024

Patchethium commented Mar 25, 2024

tarepan commented Mar 25, 2024

Patchethium commented Mar 25, 2024

		enable_guided: bool, optional, default=False
		入力音声を解析してAudio Queryで返す機能が有効かどうか

		enable_guided: bool, optional, default=False
		入力音声を解析して音声合成クエリで返す機能が有効かどうか

		# convert to int, to avoid overflow in np.int32 type
		wav = resample(wav, int(aligner.sr) * int(wav.shape[0]) // src_sr)

Guided #723

Guided #723

Conversation

Patchethium commented Aug 13, 2023

github-actions bot commented Aug 13, 2023 • edited Loading

Coverage Result

Hiroshiba left a comment

Choose a reason for hiding this comment

Patchethium commented Aug 15, 2023

Patchethium commented Aug 15, 2023

sevenc-nanashi left a comment

Choose a reason for hiding this comment

takana-v left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

Patchethium commented Aug 24, 2023

Hiroshiba left a comment

Choose a reason for hiding this comment

Patchethium commented Aug 26, 2023

takana-v left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba commented Aug 30, 2023

Patchethium commented Sep 2, 2023

Hiroshiba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hiroshiba Sep 3, 2023 • edited Loading

Choose a reason for hiding this comment

tarepan commented Mar 25, 2024

Patchethium commented Mar 25, 2024

tarepan commented Mar 25, 2024

Patchethium commented Mar 25, 2024

github-actions bot commented Aug 13, 2023 •

edited

Loading

Hiroshiba Aug 22, 2023 •

edited

Loading

takana-v left a comment •

edited

Loading

Hiroshiba Sep 3, 2023 •

edited

Loading