review

Picovoice · May 3, 2024 · 2724681 · 2724681
1 parent d1e71c6
commit 2724681
Show file tree

Hide file tree

Showing 5 changed files with 54 additions and 58 deletions.
diff --git a/README.md b/README.md
@@ -28,29 +28,29 @@ Orca may undergo changes as we continually enhance and refine the engine to prov
 ## Table of Contents
 
 - [Orca](#orca)
-  - [Table of Contents](#table-of-contents)
-  - [Overview](#overview)
-    - [Orca streaming text synthesis](#orca-streaming-text-synthesis)
-    - [Text input](#text-input)
-    - [Custom pronunciations](#custom-pronunciations)
-    - [Voices](#voices)
-    - [Speech control](#speech-control)
-    - [Audio output](#audio-output)
-  - [AccessKey](#accesskey)
-  - [Demos](#demos)
-    - [Python Demos](#python-demos)
-    - [iOS Demo](#ios-demo)
-    - [C Demos](#c-demos)
-    - [Web Demos](#web-demos)
-    - [Android Demo](#android-demo)
-  - [SDKs](#sdks)
-    - [Python](#python)
-    - [iOS](#ios)
-    - [C](#c)
-    - [Web](#web)
-    - [Android](#android)
-  - [Releases](#releases)
-  - [FAQ](#faq)
+    - [Table of Contents](#table-of-contents)
+    - [Overview](#overview)
+        - [Orca streaming text synthesis](#orca-streaming-text-synthesis)
+        - [Text input](#text-input)
+        - [Custom pronunciations](#custom-pronunciations)
+        - [Voices](#voices)
+        - [Speech control](#speech-control)
+        - [Audio output](#audio-output)
+    - [AccessKey](#accesskey)
+    - [Demos](#demos)
+        - [Python Demos](#python-demos)
+        - [iOS Demo](#ios-demo)
+        - [C Demos](#c-demos)
+        - [Web Demos](#web-demos)
+        - [Android Demo](#android-demo)
+    - [SDKs](#sdks)
+        - [Python](#python)
+        - [iOS](#ios)
+        - [C](#c)
+        - [Web](#web)
+        - [Android](#android)
+    - [Releases](#releases)
+    - [FAQ](#faq)
 
 ## Language Support
 
@@ -60,13 +60,20 @@ Orca may undergo changes as we continually enhance and refine the engine to prov
 
 ## Overview
 
-### Orca streaming text synthesis
+### Orca input and output streaming synthesis
 
 Orca is a text-to-speech engine designed specifically for LLMs. It can process
 incoming text streams in real-time, generating audio continuously, i.e., as the LLM produces tokens,
 Orca generates speech in parallel.
 This enables seamless conversations with voice assistants, eliminating any audio delays.
 
+![](https://github.com/Picovoice/orca/blob/orca-prepare-v0.2/resources/assets/orca_streaming_animation.gif)
+
+As demonstrated above, Orca starts converting text to audio right away, while other TTS systems need to wait for
+the entire LLM output to be available, introducing a delay in the voice assistant's response.
+
+Orca also supports single synthesis mode, where a complete text is synthesized in a single call to the Orca engine.
+
 ### Text input
 
 Orca accepts the 26 lowercase (a-z) and 26 uppercase (A-Z) letters of the English alphabet, numbers,
@@ -315,7 +322,7 @@ status = pv_orca_synthesize_params_init(&synthesize_params);
 
 #### Streaming synthesis
 
-To synthesize a text stream, create an `orca_stream` object using the `synthesize_params`:
+To synthesize a text stream, create an `orca_stream` object using `synthesize_params`:
 
 ```c
 pv_orca_stream_t *orca_stream = NULL;
@@ -345,7 +352,7 @@ if (num_samples_chunk > 0) {
 }
 ```
 
-Once the text stream is complete, call the flush method to synthesize the remaining text: 
+Once the text stream is complete, call the flush method to synthesize the remaining text:
 
 ```c
 status = pv_orca_stream_flush(orca_stream, &num_samples_chunk, &pcm_chunk);
@@ -364,7 +371,7 @@ pv_orca_pcm_delete(pcm_chunk);
 ```
 
 Finally, when done make sure to close the stream:
-    
+
 ```c
 pv_orca_stream_close(orca_stream);
 ```

diff --git a/binding/python/README.md b/binding/python/README.md
@@ -92,9 +92,9 @@ objects.
 You can print the metadata with:
 
 ```python
-for word in alignments:
-    print(f"word=\"{word.word}\", start_sec={word.start_sec:.2f}, end_sec={word.end_sec:.2f}")
-    for phoneme in word.phonemes:
+for token in alignments:
+    print(f"word=\"{token.word}\", start_sec={token.start_sec:.2f}, end_sec={token.end_sec:.2f}")
+    for phoneme in token.phonemes:
         print(f"\tphoneme=\"{phoneme.phoneme}\", start_sec={phoneme.start_sec:.2f}, end_sec={phoneme.end_sec:.2f}")
 ```
 
@@ -135,8 +135,8 @@ and replace `${MODEL_PATH}` with the path to the model file with the desired voi
 
 ### Speech control
 
-Orca allows for keyword arguments to be provided to the `open_stream` method or the single `synthesize` methods to
-control the synthesized speech:
+Orca allows for keyword arguments to control the synthesized speech. They can be provided to the `open_stream` 
+method or the single synthesis methods `synthesize` and `synthesize_to_file`:
 
 - `speech_rate`: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value
   produces speech that is faster (slower). The default is `1.0`.

diff --git a/binding/python/_orca.py b/binding/python/_orca.py
@@ -162,7 +162,7 @@ def synthesize(self, text: str) -> Optional[Sequence[int]]:
             Custom pronunciations can be embedded in the text via the syntax `{word|pronunciation}`.
             They need to be added in a single call to this function.
             The pronunciation is expressed in ARPAbet format, e.g.: `I {liv|L IH V} in {Sevilla|S EH V IY Y AH}`.
-            :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `NULL` if no
+            :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `None` if no
             audio chunk has been produced.
             """
 
@@ -194,7 +194,7 @@ def flush(self) -> Optional[Sequence[int]]:
             via `pv_orca_stream_synthesize()`.
             The caller is responsible for deleting the generated audio with `pv_orca_pcm_delete()`.
 
-            :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `NULL` if no
+            :return: The generated audio as a sequence of 16-bit linearly-encoded integers, `None` if no
             audio chunk has been produced.
             """
 
@@ -292,6 +292,14 @@ def __init__(self, access_key: str, model_path: str, library_path: str) -> None:
         self._max_character_limit_func.argtypes = [POINTER(self.COrca), POINTER(c_int32)]
         self._max_character_limit_func.restype = PicovoiceStatuses
 
+        c_max_character_limit = c_int32()
+        status = self._max_character_limit_func(self._handle, byref(c_max_character_limit))
+        if status is not PicovoiceStatuses.SUCCESS:
+            raise _PICOVOICE_STATUS_TO_EXCEPTION[status](
+                message="Unable to get Orca maximum character limit",
+                message_stack=self._get_error_stack())
+        self._max_character_limit = c_max_character_limit.value
+
         self._synthesize_params_init_func = library.pv_orca_synthesize_params_init
         self._synthesize_params_init_func.argtypes = [POINTER(POINTER(self.COrcaSynthesizeParams))]
         self._synthesize_params_init_func.restype = PicovoiceStatuses
@@ -420,15 +428,7 @@ def sample_rate(self) -> int:
     def max_character_limit(self) -> int:
         """Maximum number of characters allowed in a single synthesis request."""
 
-        c_max_character_limit = c_int32()
-
-        status = self._max_character_limit_func(self._handle, byref(c_max_character_limit))
-        if status is not PicovoiceStatuses.SUCCESS:
-            raise _PICOVOICE_STATUS_TO_EXCEPTION[status](
-                message="Unable to get Orca maximum character limit",
-                message_stack=self._get_error_stack())
-
-        return c_max_character_limit.value
+        return self._max_character_limit
 
     def synthesize(
             self,

diff --git a/demo/voice_assistant/README.md b/demo/voice_assistant/README.md
@@ -5,17 +5,6 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
 This demo showcases how [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/) can be seamlessly integrated into LLM-applications to drastically reduce the audio latency
 of voice assistants.
 
-## Towards Zero-Latency Voice Assistants
-
-Orca can handle streaming text input, i.e., it can start
-synthesizing audio while an LLM is still producing the response.
-
-![](https://github.com/Picovoice/orca/blob/main/resources/assets/orca_streaming_animation.gif)
-
-As demonstrated above, Orca starts converting text to audio right away, while
-[OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) needs to wait for the entire
-LLM output to be available, introducing a delay in the voice assistant's response.
-
 ## Technologies
 
 In this demo, the user can interact with a voice assistant in real-time by leveraging GenAI technologies.
@@ -26,10 +15,10 @@ The following technologies are used:
 
 - Speech to Text: Picovoice's [Cheetah Streaming Speech-to-Text](https://picovoice.ai/platform/cheetah/)
 - LLM: \"ChatGPT\" using `gpt-3.5-turbo`
-  with [OpenAI Chat Completion API](https://platform.openai.com/docs/guides/text-generation)
+  with OpenAI Chat Completion API.
 - TTS:
     - Picovoice's [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/)
-    - [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech)
+    - OpenAI TTS
 
 ## Compatibility
 
@@ -41,7 +30,7 @@ To run all features of this demo, access keys are required for:
 
 - Picovoice Console: Get your `AccessKey` for free by signing up or logging in
   to [Picovoice Console](https://console.picovoice.ai/).
-- OpenAI API: Get your `AccessKey` by signing up or logging in to [OpenAI](https://platform.openai.com/).
+- OpenAI API: Get your `AccessKey` from OpenAI.
 
 ## Usage
 

diff --git a/include/pv_orca.h b/include/pv_orca.h
@@ -101,7 +101,7 @@ PV_API pv_status_t pv_orca_max_character_limit(const pv_orca_t *object, int32_t
  * Forward declaration for pv_orca_synthesize_params object. This object can be parsed to Orca synthesize functions to
  * control the synthesized audio. An instance can be created with `pv_orca_synthesize_params_init()` and deleted with
  * `pv_orca_synthesize_params_delete()`. The object's properties can be set with `pv_orca_synthesize_params_set_*()`
- * and returned with `pv_orca_synthesize_params_get_()*`.
+ * and returned with `pv_orca_synthesize_params_get_*()`.
  */
 typedef struct pv_orca_synthesize_params pv_orca_synthesize_params_t;