Skip to content

Commit

Permalink
review
Browse files Browse the repository at this point in the history
  • Loading branch information
bejager committed May 3, 2024
1 parent d1e71c6 commit 2724681
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 58 deletions.
61 changes: 34 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,29 +28,29 @@ Orca may undergo changes as we continually enhance and refine the engine to prov
## Table of Contents

- [Orca](#orca)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Orca streaming text synthesis](#orca-streaming-text-synthesis)
- [Text input](#text-input)
- [Custom pronunciations](#custom-pronunciations)
- [Voices](#voices)
- [Speech control](#speech-control)
- [Audio output](#audio-output)
- [AccessKey](#accesskey)
- [Demos](#demos)
- [Python Demos](#python-demos)
- [iOS Demo](#ios-demo)
- [C Demos](#c-demos)
- [Web Demos](#web-demos)
- [Android Demo](#android-demo)
- [SDKs](#sdks)
- [Python](#python)
- [iOS](#ios)
- [C](#c)
- [Web](#web)
- [Android](#android)
- [Releases](#releases)
- [FAQ](#faq)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Orca streaming text synthesis](#orca-streaming-text-synthesis)
- [Text input](#text-input)
- [Custom pronunciations](#custom-pronunciations)
- [Voices](#voices)
- [Speech control](#speech-control)
- [Audio output](#audio-output)
- [AccessKey](#accesskey)
- [Demos](#demos)
- [Python Demos](#python-demos)
- [iOS Demo](#ios-demo)
- [C Demos](#c-demos)
- [Web Demos](#web-demos)
- [Android Demo](#android-demo)
- [SDKs](#sdks)
- [Python](#python)
- [iOS](#ios)
- [C](#c)
- [Web](#web)
- [Android](#android)
- [Releases](#releases)
- [FAQ](#faq)

## Language Support

Expand All @@ -60,13 +60,20 @@ Orca may undergo changes as we continually enhance and refine the engine to prov

## Overview

### Orca streaming text synthesis
### Orca input and output streaming synthesis

Orca is a text-to-speech engine designed specifically for LLMs. It can process
incoming text streams in real-time, generating audio continuously, i.e., as the LLM produces tokens,
Orca generates speech in parallel.
This enables seamless conversations with voice assistants, eliminating any audio delays.

![](https://github.com/Picovoice/orca/blob/orca-prepare-v0.2/resources/assets/orca_streaming_animation.gif)

As demonstrated above, Orca starts converting text to audio right away, while other TTS systems need to wait for
the entire LLM output to be available, introducing a delay in the voice assistant's response.

Orca also supports single synthesis mode, where a complete text is synthesized in a single call to the Orca engine.

### Text input

Orca accepts the 26 lowercase (a-z) and 26 uppercase (A-Z) letters of the English alphabet, numbers,
Expand Down Expand Up @@ -315,7 +322,7 @@ status = pv_orca_synthesize_params_init(&synthesize_params);

#### Streaming synthesis

To synthesize a text stream, create an `orca_stream` object using the `synthesize_params`:
To synthesize a text stream, create an `orca_stream` object using `synthesize_params`:

```c
pv_orca_stream_t *orca_stream = NULL;
Expand Down Expand Up @@ -345,7 +352,7 @@ if (num_samples_chunk > 0) {
}
```
Once the text stream is complete, call the flush method to synthesize the remaining text:
Once the text stream is complete, call the flush method to synthesize the remaining text:
```c
status = pv_orca_stream_flush(orca_stream, &num_samples_chunk, &pcm_chunk);
Expand All @@ -364,7 +371,7 @@ pv_orca_pcm_delete(pcm_chunk);
```
Finally, when done make sure to close the stream:
```c
pv_orca_stream_close(orca_stream);
```
Expand Down
10 changes: 5 additions & 5 deletions binding/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,9 +92,9 @@ objects.
You can print the metadata with:

```python
for word in alignments:
print(f"word=\"{word.word}\", start_sec={word.start_sec:.2f}, end_sec={word.end_sec:.2f}")
for phoneme in word.phonemes:
for token in alignments:
print(f"word=\"{token.word}\", start_sec={token.start_sec:.2f}, end_sec={token.end_sec:.2f}")
for phoneme in token.phonemes:
print(f"\tphoneme=\"{phoneme.phoneme}\", start_sec={phoneme.start_sec:.2f}, end_sec={phoneme.end_sec:.2f}")
```

Expand Down Expand Up @@ -135,8 +135,8 @@ and replace `${MODEL_PATH}` with the path to the model file with the desired voi

### Speech control

Orca allows for keyword arguments to be provided to the `open_stream` method or the single `synthesize` methods to
control the synthesized speech:
Orca allows for keyword arguments to control the synthesized speech. They can be provided to the `open_stream`
method or the single synthesis methods `synthesize` and `synthesize_to_file`:

- `speech_rate`: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value
produces speech that is faster (slower). The default is `1.0`.
Expand Down
22 changes: 11 additions & 11 deletions binding/python/_orca.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def synthesize(self, text: str) -> Optional[Sequence[int]]:
Custom pronunciations can be embedded in the text via the syntax `{word|pronunciation}`.
They need to be added in a single call to this function.
The pronunciation is expressed in ARPAbet format, e.g.: `I {liv|L IH V} in {Sevilla|S EH V IY Y AH}`.
:return: The generated audio as a sequence of 16-bit linearly-encoded integers, `NULL` if no
:return: The generated audio as a sequence of 16-bit linearly-encoded integers, `None` if no
audio chunk has been produced.
"""

Expand Down Expand Up @@ -194,7 +194,7 @@ def flush(self) -> Optional[Sequence[int]]:
via `pv_orca_stream_synthesize()`.
The caller is responsible for deleting the generated audio with `pv_orca_pcm_delete()`.
:return: The generated audio as a sequence of 16-bit linearly-encoded integers, `NULL` if no
:return: The generated audio as a sequence of 16-bit linearly-encoded integers, `None` if no
audio chunk has been produced.
"""

Expand Down Expand Up @@ -292,6 +292,14 @@ def __init__(self, access_key: str, model_path: str, library_path: str) -> None:
self._max_character_limit_func.argtypes = [POINTER(self.COrca), POINTER(c_int32)]
self._max_character_limit_func.restype = PicovoiceStatuses

c_max_character_limit = c_int32()
status = self._max_character_limit_func(self._handle, byref(c_max_character_limit))
if status is not PicovoiceStatuses.SUCCESS:
raise _PICOVOICE_STATUS_TO_EXCEPTION[status](
message="Unable to get Orca maximum character limit",
message_stack=self._get_error_stack())
self._max_character_limit = c_max_character_limit.value

self._synthesize_params_init_func = library.pv_orca_synthesize_params_init
self._synthesize_params_init_func.argtypes = [POINTER(POINTER(self.COrcaSynthesizeParams))]
self._synthesize_params_init_func.restype = PicovoiceStatuses
Expand Down Expand Up @@ -420,15 +428,7 @@ def sample_rate(self) -> int:
def max_character_limit(self) -> int:
"""Maximum number of characters allowed in a single synthesis request."""

c_max_character_limit = c_int32()

status = self._max_character_limit_func(self._handle, byref(c_max_character_limit))
if status is not PicovoiceStatuses.SUCCESS:
raise _PICOVOICE_STATUS_TO_EXCEPTION[status](
message="Unable to get Orca maximum character limit",
message_stack=self._get_error_stack())

return c_max_character_limit.value
return self._max_character_limit

def synthesize(
self,
Expand Down
17 changes: 3 additions & 14 deletions demo/voice_assistant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,6 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
This demo showcases how [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/) can be seamlessly integrated into LLM-applications to drastically reduce the audio latency
of voice assistants.

## Towards Zero-Latency Voice Assistants

Orca can handle streaming text input, i.e., it can start
synthesizing audio while an LLM is still producing the response.

![](https://github.com/Picovoice/orca/blob/main/resources/assets/orca_streaming_animation.gif)

As demonstrated above, Orca starts converting text to audio right away, while
[OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) needs to wait for the entire
LLM output to be available, introducing a delay in the voice assistant's response.

## Technologies

In this demo, the user can interact with a voice assistant in real-time by leveraging GenAI technologies.
Expand All @@ -26,10 +15,10 @@ The following technologies are used:

- Speech to Text: Picovoice's [Cheetah Streaming Speech-to-Text](https://picovoice.ai/platform/cheetah/)
- LLM: \"ChatGPT\" using `gpt-3.5-turbo`
with [OpenAI Chat Completion API](https://platform.openai.com/docs/guides/text-generation)
with OpenAI Chat Completion API.
- TTS:
- Picovoice's [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/)
- [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech)
- OpenAI TTS

## Compatibility

Expand All @@ -41,7 +30,7 @@ To run all features of this demo, access keys are required for:

- Picovoice Console: Get your `AccessKey` for free by signing up or logging in
to [Picovoice Console](https://console.picovoice.ai/).
- OpenAI API: Get your `AccessKey` by signing up or logging in to [OpenAI](https://platform.openai.com/).
- OpenAI API: Get your `AccessKey` from OpenAI.

## Usage

Expand Down
2 changes: 1 addition & 1 deletion include/pv_orca.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ PV_API pv_status_t pv_orca_max_character_limit(const pv_orca_t *object, int32_t
* Forward declaration for pv_orca_synthesize_params object. This object can be parsed to Orca synthesize functions to
* control the synthesized audio. An instance can be created with `pv_orca_synthesize_params_init()` and deleted with
* `pv_orca_synthesize_params_delete()`. The object's properties can be set with `pv_orca_synthesize_params_set_*()`
* and returned with `pv_orca_synthesize_params_get_()*`.
* and returned with `pv_orca_synthesize_params_get_*()`.
*/
typedef struct pv_orca_synthesize_params pv_orca_synthesize_params_t;

Expand Down

0 comments on commit 2724681

Please sign in to comment.