Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guided #723

Closed
wants to merge 32 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b874227
install snfa
Patchethium Aug 13, 2023
385c9c1
basic guided feature
Patchethium Aug 14, 2023
5bf9bf2
remove kana to phoneme, fix lint
Patchethium Aug 15, 2023
fc8a7c6
scale duration instead of pitch
Patchethium Aug 15, 2023
6a1eee8
move forced aligner model path to args
Patchethium Aug 15, 2023
4fc1484
guard long vowel
Patchethium Aug 15, 2023
5d31351
run lint
Patchethium Aug 15, 2023
ab3614f
update poetry deps and requirements
Patchethium Aug 15, 2023
021a679
add cv_jp.bin into bundle
Patchethium Aug 15, 2023
cb49bf4
fix poetry deps
Patchethium Aug 15, 2023
4432980
the last one doesn't work, fix poetry deps again
Patchethium Aug 15, 2023
29ac4a9
fix mock for guide method
Patchethium Aug 15, 2023
5b78b4d
run format
Patchethium Aug 15, 2023
c86d0d7
use with in reading reference audio
Patchethium Aug 20, 2023
b5c2d86
improve help text for arg `enable_guided`
Patchethium Aug 20, 2023
3d3a7c0
improve comment for function `guide`
Patchethium Aug 20, 2023
5f37b16
Merge branch 'master' into guided
Patchethium Aug 20, 2023
4ec5ef5
remove arg enable_guided in make_synthesis_engines
Patchethium Aug 21, 2023
432ac57
improve pitch normalization
Patchethium Aug 26, 2023
eb9c638
Merge branch 'master' into guided
Patchethium Aug 26, 2023
8d452c4
use to_flatten_moras in normalizing guided pitch
Patchethium Aug 26, 2023
3bdd8d3
use `speaker_id` instead of `speaker` in function guide
Patchethium Aug 26, 2023
a69a06a
run format
Patchethium Aug 26, 2023
b2b6a07
use curl instead of wget
Patchethium Aug 26, 2023
30db29c
download snfa's model in Dockerfile
Patchethium Aug 26, 2023
2e5be32
move stereo2mono to extractor
Patchethium Sep 2, 2023
78f7162
remove redundant comment
Patchethium Sep 2, 2023
ac0a314
download snfa model in ci build workflow
Patchethium Sep 2, 2023
7f15c54
add a simple test for guided
Patchethium Sep 2, 2023
d034b44
run format and typos
Patchethium Sep 2, 2023
6c4fe0a
add downloading into test and build-docker;
Patchethium Sep 2, 2023
b19783d
prevent overflow in numpy.int32 type
Patchethium Sep 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/build-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,10 @@ jobs:
username: ${{ vars.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

# Download snfa forced aligner model
- name: Download model for guided synthesis
run: curl -N -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin

Comment on lines +99 to +102
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion that could improve the usability of the library! What do you think about bundling this binary file within the snfa library or adding a feature to automatically download the model file if it's missing?
Bundling is fairly common; for example, the soundfile library includes a DLL. Auto-downloading is also a common feature; for example, pyopenjtalk automatically downloads a missing dictionary
https://github.com/r9y9/pyopenjtalk/blob/22852ba6e36faaf2589b458e731c701e24f9dc9d/pyopenjtalk/__init__.py#L77-L79.


ライブラリの使い勝手が上がりそうな提案があります!
このバイナリファイルをsnfaライブラリの中に同梱したり、あるいはモデルファイルがなかったら自動でダウンロードする機能をつけるのはどうでしょうか?
同梱するのは結構普通のことで、例えばsoundfileなどもdllが同梱されていたと思います。
自動ダウンロードもよくある機能で、例えばpyopenjtalkは辞書がない場合に自動的にダウンロードしています。
https://github.com/r9y9/pyopenjtalk/blob/22852ba6e36faaf2589b458e731c701e24f9dc9d/pyopenjtalk/__init__.py#L77-L79

# Download VOICEVOX RESOURCE
- name: Prepare VOICEVOX RESOURCE cache
uses: actions/cache@v3
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,10 @@ jobs:
key: ${{ steps.onnxruntime-cache-restore.outputs.cache-primary-key }}
path: download/onnxruntime

# Download snfa forced aligner model
- name: Download model for guided synthesis
run: curl -N -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin

# Download VOICEVOX RESOURCE
- name: Prepare VOICEVOX RESOURCE cache
uses: actions/cache@v3
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ jobs:
steps:
- uses: actions/checkout@v3

# Download snfa forced aligner model
- name: Download model for guided synthesis
run: curl -N -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin

- name: Set up Python ${{ matrix.python }}
uses: actions/setup-python@v4
with:
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -155,3 +155,7 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# snfa forced aligner model file
# for `/guide` API
cv_jp.bin
5 changes: 5 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,11 @@ RUN <<EOF
fi
EOF

# Download snfa's forced aligner model
RUN <<EOF
curl -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin
EOF

# Download Resource
ARG VOICEVOX_RESOURCE_VERSION=0.14.3
RUN <<EOF
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,8 @@ Issue 側で取り組み始めたことを伝えるか、最初に Draft プル
```bash
# 開発に必要なライブラリのインストール
python -m pip install -r requirements-dev.txt -r requirements-test.txt
# `guide`API用のモデルをダウンロードする
curl -N -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin

# とりあえず実行したいだけなら代わりにこちら
python -m pip install -r requirements.txt
Expand Down Expand Up @@ -492,6 +494,10 @@ python -m pip install -r requirements-dev.txt
OUTPUT_LICENSE_JSON_PATH=licenses.json \
bash build_util/create_venv_and_generate_licenses.bash

# `guide`を有効化するモデル、重複ダウンロードしないように`-N`をつけます
curl -N -L https://github.com/Patchethium/snfa/releases/download/v0.0.1/cv_jp.bin -o ./cv_jp.bin

# ビルド自体はLIBCORE_PATH及びLIBONNXRUNTIME_PATHの指定がなくても可能です
# モックでビルドする場合
pyinstaller --noconfirm run.spec

Expand Down
23 changes: 19 additions & 4 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ requests = "^2.28.1"
jinja2 = "^3.1.2"
pyopenjtalk = {git = "https://github.com/VOICEVOX/pyopenjtalk", rev = "acd4f02d2af3129382c151590238b9370465e360"}
semver = "^3.0.0"
snfa = "^0.0.1"
platformdirs = "^3.10.0"

[tool.poetry.group.dev.dependencies]
Expand All @@ -63,6 +64,7 @@ pre-commit = "^2.16.0"
atomicwrites = "^1.4.0"
colorama = "^0.4.4"
poetry = "^1.3.1"
snfa = "^0.0.1"

[tool.poetry.group.test.dependencies]
pysen = "~0.10.5"
Expand All @@ -74,6 +76,7 @@ mypy = "~0.991"
pytest = "^6.2.5"
coveralls = "^3.2.0"
poetry = "^1.3.1"
snfa = "^0.0.1"

[tool.poetry.group.license.dependencies]
pip-licenses = "^4.2.0"
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ semver==3.0.1 ; python_version >= "3.11" and python_version < "3.12"
setuptools==68.1.2 ; python_version >= "3.11" and python_version < "3.12"
shellingham==1.5.3 ; python_version >= "3.11" and python_version < "3.12"
six==1.16.0 ; python_version >= "3.11" and python_version < "3.12"
snfa==0.0.1 ; python_version >= "3.11" and python_version < "3.12"
sniffio==1.3.0 ; python_version >= "3.11" and python_version < "3.12"
soundfile==0.10.3.post1 ; python_version >= "3.11" and python_version < "3.12"
starlette==0.16.0 ; python_version >= "3.11" and python_version < "3.12"
Expand Down
1 change: 1 addition & 0 deletions requirements-license.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ requests==2.31.0 ; python_version >= "3.11" and python_version < "3.12"
scipy==1.11.2 ; python_version >= "3.11" and python_version < "3.12"
semver==3.0.1 ; python_version >= "3.11" and python_version < "3.12"
six==1.16.0 ; python_version >= "3.11" and python_version < "3.12"
snfa==0.0.1 ; python_version >= "3.11" and python_version < "3.12"
sniffio==1.3.0 ; python_version >= "3.11" and python_version < "3.12"
soundfile==0.10.3.post1 ; python_version >= "3.11" and python_version < "3.12"
starlette==0.16.0 ; python_version >= "3.11" and python_version < "3.12"
Expand Down
1 change: 1 addition & 0 deletions requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ semver==3.0.1 ; python_version >= "3.11" and python_version < "3.12"
shellingham==1.5.3 ; python_version >= "3.11" and python_version < "3.12"
six==1.16.0 ; python_version >= "3.11" and python_version < "3.12"
smmap==5.0.0 ; python_version >= "3.11" and python_version < "3.12"
snfa==0.0.1 ; python_version >= "3.11" and python_version < "3.12"
sniffio==1.3.0 ; python_version >= "3.11" and python_version < "3.12"
soundfile==0.10.3.post1 ; python_version >= "3.11" and python_version < "3.12"
starlette==0.16.0 ; python_version >= "3.11" and python_version < "3.12"
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ requests==2.31.0 ; python_version >= "3.11" and python_version < "3.12"
scipy==1.11.2 ; python_version >= "3.11" and python_version < "3.12"
semver==3.0.1 ; python_version >= "3.11" and python_version < "3.12"
six==1.16.0 ; python_version >= "3.11" and python_version < "3.12"
snfa==0.0.1 ; python_version >= "3.11" and python_version < "3.12"
sniffio==1.3.0 ; python_version >= "3.11" and python_version < "3.12"
soundfile==0.10.3.post1 ; python_version >= "3.11" and python_version < "3.12"
starlette==0.16.0 ; python_version >= "3.11" and python_version < "3.12"
Expand Down
51 changes: 47 additions & 4 deletions run.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,44 @@ def accent_phrases(
else:
return engine.create_accent_phrases(text, speaker_id=speaker)

@app.post(
"/guide",
response_model=AudioQuery,
tags=["クエリ編集"],
summary="Create Accent Phrase from External Audio",
)
def guide(
query: AudioQuery,
speaker: int,
ref_path: str,
normalize: bool,
core_version: Optional[str] = None,
):
if not args.enable_guided:
raise HTTPException(
status_code=404,
detail="実験的機能はデフォルトで無効になっています。使用するには引数を指定してください。",
)
try:
with open(ref_path, "rb") as file:
# use dtype=float32 also normalizes the wav into [-1.0,1.0]
wav, sr = soundfile.read(file, dtype="float32")
except Exception:
raise HTTPException(
status_code=422,
detail="Invalid wav file",
)

engine = get_engine(core_version)
return engine.guide(
query=query,
speaker_id=speaker,
ref_wav=wav,
sr=sr,
normalize=normalize,
model_path=args.guide_model,
)

@app.post(
"/mora_data",
response_model=List[AccentPhrase],
Expand Down Expand Up @@ -475,18 +513,14 @@ def multi_synthesis(
sampling_rate = queries[0].outputSamplingRate

with NamedTemporaryFile(delete=False) as f:

with zipfile.ZipFile(f, mode="a") as zip_file:

for i in range(len(queries)):

if queries[i].outputSamplingRate != sampling_rate:
raise HTTPException(
status_code=422, detail="サンプリングレートが異なるクエリがあります"
)

with TemporaryFile() as wav_file:

wave = engine.synthesis(query=queries[i], speaker_id=speaker)
soundfile.write(
file=wav_file,
Expand Down Expand Up @@ -1221,6 +1255,15 @@ def custom_openapi():
action="store_true",
help="指定すると音声合成を途中でキャンセルできるようになります。",
)
parser.add_argument(
"--enable_guided", action="store_true", help="入力音声を解析して音声合成クエリで返す機能を有効化します。"
)
parser.add_argument(
"--guide_model",
type=Path,
default="cv_jp.bin",
Copy link
Member

@Hiroshiba Hiroshiba Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide later where to place the file and how to set the parameters!


ファイル名をどうするかや、引数をどうするかは後で決めさせていただこうと思います!

help="guided機能に入力音声の発音の長さを解析するため必要なモデルファイルです。",
)
parser.add_argument(
"--init_processes",
type=int,
Expand Down
2 changes: 2 additions & 0 deletions run.spec
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ datas = [
('presets.yaml', '.'),
('default_setting.yml', '.'),
('ui_template', 'ui_template'),
('model', 'model'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要そう?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code is unnecessary.

('cv_jp.bin', '.')
]
datas += collect_data_files('pyopenjtalk')

Expand Down
Binary file added test/ref_audio.wav
Binary file not shown.
Loading