Skip to content

Commit

Permalink
tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
bejager committed May 2, 2024
1 parent f6de5ef commit ed8b9f7
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 13 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-demo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
install_dep: sudo apt install libportaudio2
- os: windows-latest
- os: macos-latest
install_dep: brew install portaudio
install_dep: brew update && brew install portaudio --HEAD

steps:
- uses: actions/checkout@v3
Expand Down
25 changes: 15 additions & 10 deletions binding/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Signup or Login to [Picovoice Console](https://console.picovoice.ai/) to get you

Orca supports two modes of operation: streaming and single synthesis.
In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel.
In the single synthesis mode, the complete text needs to be known in advance and is synthesized in a single call to the Orca engine.
In the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.

Create an instance of the Orca engine:

Expand All @@ -55,24 +55,27 @@ stream = orca.open_stream()
for text_chunk in text_generator():
pcm = stream.synthesize(text_chunk)
if pcm is not None:
# handle pcm
# handle pcm

pcm = stream.flush()
if pcm is not None:
# handle pcm
# handle pcm
```

The `text_generator()` function can be any stream generating text, for example an LLM response.
Orca produces audio chunks in parallel to the LLM, and returns the raw PCM whenever enough context has been added via `stream.synthesize()`.
The `stream.synthesize()` function returns an audio chunk that only includes the audio for a portion of the text that has been added.
Orca produces audio chunks in parallel to the incoming text stream, and returns the raw PCM whenever enough context has
been added via `stream.synthesize()`.
To ensure smooth transitions between chunks, the `stream.synthesize()` function returns an audio chunk that only
includes the audio for a portion of the text that has been added.
To generate the audio for the remaining text, `stream.flush()` needs to be invoked.
When done with streaming text synthesis, the `Orca.Stream` object needs to be closed:

```python
stream.close()
```

If the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to Orca:
If the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to
Orca:

```python
# Return raw PCM
Expand All @@ -84,8 +87,9 @@ alignments = orca.synthesize_to_file(text='${TEXT}', path='${OUTPUT_PATH}')

Replace `${TEXT}` with the text to be synthesized and `${OUTPUT_PATH}` with the path to save the generated audio as a
single-channel 16-bit PCM WAV file.
In single synthesis mode, Orca returns metadata of the synthesized audio in the form of a list of `Orca.WordAlignment` objects.
To print the metadata run:
In single synthesis mode, Orca returns metadata of the synthesized audio in the form of a list of `Orca.WordAlignment`
objects.
You can print the metadata with:

```python
for word in alignments:
Expand All @@ -94,7 +98,7 @@ for word in alignments:
print(f"\tphoneme=\"{phoneme.phoneme}\", start_sec={phoneme.start_sec:.2f}, end_sec={phoneme.end_sec:.2f}")
```

When done make sure to explicitly release the resources with:
When done make sure to explicitly release the resources using:

```python
orca.delete()
Expand Down Expand Up @@ -131,7 +135,8 @@ and replace `${MODEL_PATH}` with the path to the model file with the desired voi

### Speech control

Orca allows for keyword arguments to be provided to the `open_stream` method or the single `synthesize` methods to control the synthesized speech:
Orca allows for keyword arguments to be provided to the `open_stream` method or the single `synthesize` methods to
control the synthesized speech:

- `speech_rate`: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value
produces speech that is faster (slower). The default is `1.0`.
Expand Down
2 changes: 1 addition & 1 deletion binding/python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,6 @@
"Programming Language :: Python :: 3",
"Topic :: Multimedia :: Sound/Audio :: Speech",
],
python_requires='>=3.7',
python_requires='>=3.8',
keywords="Text-to-Speech, TTS, Speech Synthesis, Voice Generation, Speech Engine",
)
2 changes: 1 addition & 1 deletion demo/python/orca_demo_streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def play_audio_callback(pcm: Sequence[int]):
"--tokens-per-second",
type=int,
default=15,
help="Number of tokens to be streamed per second to Orca, simulating an LLM response.")
help="Number of tokens per second to be streamed to Orca, simulating an LLM response.")
parser.add_argument(
"--audio-wait-chunks",
type=int,
Expand Down

0 comments on commit ed8b9f7

Please sign in to comment.