diff --git a/README.md b/README.md index 4107b5b2..77fa076d 100644 --- a/README.md +++ b/README.md @@ -28,29 +28,29 @@ Orca may undergo changes as we continually enhance and refine the engine to prov ## Table of Contents - [Orca](#orca) - - [Table of Contents](#table-of-contents) - - [Overview](#overview) - - [Orca streaming text synthesis](#orca-streaming-text-synthesis) - - [Text input](#text-input) - - [Custom pronunciations](#custom-pronunciations) - - [Voices](#voices) - - [Speech control](#speech-control) - - [Audio output](#audio-output) - - [AccessKey](#accesskey) - - [Demos](#demos) - - [Python Demos](#python-demos) - - [iOS Demo](#ios-demo) - - [C Demos](#c-demos) - - [Web Demos](#web-demos) - - [Android Demo](#android-demo) - - [SDKs](#sdks) - - [Python](#python) - - [iOS](#ios) - - [C](#c) - - [Web](#web) - - [Android](#android) - - [Releases](#releases) - - [FAQ](#faq) + - [Table of Contents](#table-of-contents) + - [Overview](#overview) + - [Orca streaming text synthesis](#orca-streaming-text-synthesis) + - [Text input](#text-input) + - [Custom pronunciations](#custom-pronunciations) + - [Voices](#voices) + - [Speech control](#speech-control) + - [Audio output](#audio-output) + - [AccessKey](#accesskey) + - [Demos](#demos) + - [Python Demos](#python-demos) + - [iOS Demo](#ios-demo) + - [C Demos](#c-demos) + - [Web Demos](#web-demos) + - [Android Demo](#android-demo) + - [SDKs](#sdks) + - [Python](#python) + - [iOS](#ios) + - [C](#c) + - [Web](#web) + - [Android](#android) + - [Releases](#releases) + - [FAQ](#faq) ## Language Support @@ -60,13 +60,20 @@ Orca may undergo changes as we continually enhance and refine the engine to prov ## Overview -### Orca streaming text synthesis +### Orca input and output streaming synthesis Orca is a text-to-speech engine designed specifically for LLMs. It can process incoming text streams in real-time, generating audio continuously, i.e., as the LLM produces tokens, Orca generates speech in parallel. This enables seamless conversations with voice assistants, eliminating any audio delays. +![](https://github.com/Picovoice/orca/blob/orca-prepare-v0.2/resources/assets/orca_streaming_animation.gif) + +As demonstrated above, Orca starts converting text to audio right away, while other TTS systems need to wait for +the entire LLM output to be available, introducing a delay in the voice assistant's response. + +Orca also supports single synthesis mode, where a complete text is synthesized in a single call to the Orca engine. + ### Text input Orca accepts the 26 lowercase (a-z) and 26 uppercase (A-Z) letters of the English alphabet, numbers, @@ -315,7 +322,7 @@ status = pv_orca_synthesize_params_init(&synthesize_params); #### Streaming synthesis -To synthesize a text stream, create an `orca_stream` object using the `synthesize_params`: +To synthesize a text stream, create an `orca_stream` object using `synthesize_params`: ```c pv_orca_stream_t *orca_stream = NULL; @@ -345,7 +352,7 @@ if (num_samples_chunk > 0) { } ``` -Once the text stream is complete, call the flush method to synthesize the remaining text: +Once the text stream is complete, call the flush method to synthesize the remaining text: ```c status = pv_orca_stream_flush(orca_stream, &num_samples_chunk, &pcm_chunk); @@ -364,7 +371,7 @@ pv_orca_pcm_delete(pcm_chunk); ``` Finally, when done make sure to close the stream: - + ```c pv_orca_stream_close(orca_stream); ``` diff --git a/binding/python/README.md b/binding/python/README.md index 6e7d82de..fb74499c 100644 --- a/binding/python/README.md +++ b/binding/python/README.md @@ -92,9 +92,9 @@ objects. You can print the metadata with: ```python -for word in alignments: - print(f"word=\"{word.word}\", start_sec={word.start_sec:.2f}, end_sec={word.end_sec:.2f}") - for phoneme in word.phonemes: +for token in alignments: + print(f"word=\"{token.word}\", start_sec={token.start_sec:.2f}, end_sec={token.end_sec:.2f}") + for phoneme in token.phonemes: print(f"\tphoneme=\"{phoneme.phoneme}\", start_sec={phoneme.start_sec:.2f}, end_sec={phoneme.end_sec:.2f}") ``` @@ -135,8 +135,8 @@ and replace `${MODEL_PATH}` with the path to the model file with the desired voi ### Speech control -Orca allows for keyword arguments to be provided to the `open_stream` method or the single `synthesize` methods to -control the synthesized speech: +Orca allows for keyword arguments to control the synthesized speech. They can be provided to the `open_stream` +method or the single synthesis methods `synthesize` and `synthesize_to_file`: - `speech_rate`: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value produces speech that is faster (slower). The default is `1.0`. diff --git a/binding/python/_orca.py b/binding/python/_orca.py index 3971c6f3..9130d9e8 100644 --- a/binding/python/_orca.py +++ b/binding/python/_orca.py @@ -162,7 +162,7 @@ def synthesize(self, text: str) -> Optional[Sequence[int]]: Custom pronunciations can be embedded in the text via the syntax `{word|pronunciation}`. They need to be added in a single call to this function. The pronunciation is expressed in ARPAbet format, e.g.: `I {liv|L IH V} in {Sevilla|S EH V IY Y AH}`. - :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `NULL` if no + :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `None` if no audio chunk has been produced. """ @@ -194,7 +194,7 @@ def flush(self) -> Optional[Sequence[int]]: via `pv_orca_stream_synthesize()`. The caller is responsible for deleting the generated audio with `pv_orca_pcm_delete()`. - :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `NULL` if no + :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `None` if no audio chunk has been produced. """ @@ -292,6 +292,14 @@ def __init__(self, access_key: str, model_path: str, library_path: str) -> None: self._max_character_limit_func.argtypes = [POINTER(self.COrca), POINTER(c_int32)] self._max_character_limit_func.restype = PicovoiceStatuses + c_max_character_limit = c_int32() + status = self._max_character_limit_func(self._handle, byref(c_max_character_limit)) + if status is not PicovoiceStatuses.SUCCESS: + raise _PICOVOICE_STATUS_TO_EXCEPTION[status]( + message="Unable to get Orca maximum character limit", + message_stack=self._get_error_stack()) + self._max_character_limit = c_max_character_limit.value + self._synthesize_params_init_func = library.pv_orca_synthesize_params_init self._synthesize_params_init_func.argtypes = [POINTER(POINTER(self.COrcaSynthesizeParams))] self._synthesize_params_init_func.restype = PicovoiceStatuses @@ -420,15 +428,7 @@ def sample_rate(self) -> int: def max_character_limit(self) -> int: """Maximum number of characters allowed in a single synthesis request.""" - c_max_character_limit = c_int32() - - status = self._max_character_limit_func(self._handle, byref(c_max_character_limit)) - if status is not PicovoiceStatuses.SUCCESS: - raise _PICOVOICE_STATUS_TO_EXCEPTION[status]( - message="Unable to get Orca maximum character limit", - message_stack=self._get_error_stack()) - - return c_max_character_limit.value + return self._max_character_limit def synthesize( self, diff --git a/demo/voice_assistant/README.md b/demo/voice_assistant/README.md index 5de09a9d..b4855e85 100644 --- a/demo/voice_assistant/README.md +++ b/demo/voice_assistant/README.md @@ -5,17 +5,6 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai) This demo showcases how [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/) can be seamlessly integrated into LLM-applications to drastically reduce the audio latency of voice assistants. -## Towards Zero-Latency Voice Assistants - -Orca can handle streaming text input, i.e., it can start -synthesizing audio while an LLM is still producing the response. - -![](https://github.com/Picovoice/orca/blob/main/resources/assets/orca_streaming_animation.gif) - -As demonstrated above, Orca starts converting text to audio right away, while -[OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) needs to wait for the entire -LLM output to be available, introducing a delay in the voice assistant's response. - ## Technologies In this demo, the user can interact with a voice assistant in real-time by leveraging GenAI technologies. @@ -26,10 +15,10 @@ The following technologies are used: - Speech to Text: Picovoice's [Cheetah Streaming Speech-to-Text](https://picovoice.ai/platform/cheetah/) - LLM: \"ChatGPT\" using `gpt-3.5-turbo` - with [OpenAI Chat Completion API](https://platform.openai.com/docs/guides/text-generation) + with OpenAI Chat Completion API. - TTS: - Picovoice's [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/) - - [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) + - OpenAI TTS ## Compatibility @@ -41,7 +30,7 @@ To run all features of this demo, access keys are required for: - Picovoice Console: Get your `AccessKey` for free by signing up or logging in to [Picovoice Console](https://console.picovoice.ai/). -- OpenAI API: Get your `AccessKey` by signing up or logging in to [OpenAI](https://platform.openai.com/). +- OpenAI API: Get your `AccessKey` from OpenAI. ## Usage diff --git a/include/pv_orca.h b/include/pv_orca.h index 9841d2e4..402bf827 100644 --- a/include/pv_orca.h +++ b/include/pv_orca.h @@ -101,7 +101,7 @@ PV_API pv_status_t pv_orca_max_character_limit(const pv_orca_t *object, int32_t * Forward declaration for pv_orca_synthesize_params object. This object can be parsed to Orca synthesize functions to * control the synthesized audio. An instance can be created with `pv_orca_synthesize_params_init()` and deleted with * `pv_orca_synthesize_params_delete()`. The object's properties can be set with `pv_orca_synthesize_params_set_*()` - * and returned with `pv_orca_synthesize_params_get_()*`. + * and returned with `pv_orca_synthesize_params_get_*()`. */ typedef struct pv_orca_synthesize_params pv_orca_synthesize_params_t;