update readmes and workflows

Picovoice · May 3, 2024 · d1e71c6 · d1e71c6
1 parent 66305ed
commit d1e71c6
Show file tree

Hide file tree

Showing 8 changed files with 76 additions and 19 deletions.
diff --git a/.github/workflows/c-demos.yml b/.github/workflows/c-demos.yml
@@ -66,10 +66,12 @@ jobs:
         run: cmake -G "${{ matrix.make_file }}" -B ./build
 
       - name: Build demo
-        run: cmake --build ./build --target orca_demo
+        run: |
+          cmake --build ./build --target orca_demo
+          cmake --build ./build --target orca_demo_streaming
 
       - name: Test
-        run: python test/test_orca_c.py ${{secrets.PV_VALID_ACCESS_KEY}} ${{ matrix.platform }} ${{ matrix.arch }}
+        run: python3 test/test_orca_c.py ${{secrets.PV_VALID_ACCESS_KEY}} ${{ matrix.platform }} ${{ matrix.arch }}
 
   build-demo-self-hosted:
     runs-on: ${{ matrix.machine }}
@@ -106,7 +108,9 @@ jobs:
         run: cmake -B ./build
 
       - name: Build demo
-        run: cmake --build ./build --target orca_demo
+        run: |
+          cmake --build ./build --target orca_demo
+          cmake --build ./build --target orca_demo_streaming
 
       - name: Test
         run: python3 test/test_orca_c.py ${{secrets.PV_VALID_ACCESS_KEY}} ${{ matrix.platform }} ${{ matrix.arch }}
diff --git a/.github/workflows/python-demo.yml b/.github/workflows/python-demo.yml
@@ -32,6 +32,7 @@ jobs:
             install_dep: sudo apt install libportaudio2
           - os: windows-latest
           - os: macos-latest
+            install_dep: brew install portaudio --HEAD
 
     steps:
       - uses: actions/checkout@v3
@@ -42,12 +43,12 @@ jobs:
           python-version: ${{ matrix.python-version }}
 
       - name: Pre-build dependencies
-        run: python -m pip install --upgrade pip
+        run: python3 -m pip install --upgrade pip
 
       # TODO: remove after release
       - name: Build dependencies
         run: |
-          python -m pip install -U pip setuptools
+          python3 -m pip install -U pip setuptools
           pip install wheel
           cd ../../binding/python
           python3 setup.py sdist bdist_wheel
@@ -66,7 +67,7 @@ jobs:
 
       - name: Test single
         run: >
-          python orca_demo.py
+          python3 orca_demo.py
           --access_key ${{secrets.PV_VALID_ACCESS_KEY}}
           --text "Hello, I am Orca!"
           --output_path ./tmp.wav

diff --git a/.github/workflows/python-perf.yml b/.github/workflows/python-perf.yml
@@ -60,7 +60,7 @@ jobs:
           python-version: '3.10'
 
       - name: Pre-build dependencies
-        run: python -m pip install --upgrade pip
+        run: python3 -m pip install --upgrade pip
 
       - name: Install dependencies
         run: pip install -r requirements.txt

diff --git a/.github/workflows/python.yml b/.github/workflows/python.yml
@@ -49,13 +49,13 @@ jobs:
           python-version: ${{ matrix.python-version }}
 
       - name: Pre-build dependencies
-        run: python -m pip install --upgrade pip
+        run: python3 -m pip install --upgrade pip
 
       - name: Install dependencies
         run: pip install -r requirements.txt
 
       - name: Test
-        run: python test_orca.py --access-key ${{secrets.PV_VALID_ACCESS_KEY}}
+        run: python3 test_orca.py --access-key ${{secrets.PV_VALID_ACCESS_KEY}}
 
   build-self-hosted:
     runs-on: ${{ matrix.machine }}

diff --git a/demo/python/README.md b/demo/python/README.md
@@ -1,10 +1,11 @@
-# Orca Text-to-Speech Engine Demo
+# Orca Text-to-Speech Engine Python Demo
 
 Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
 
 ## Orca
 
-Orca is an on-device text-to-speech engine producing high-quality, realistic, spoken audio with zero latency. Orca is:
+Orca is an on-device text-to-speech engine designed for use with LLMs, enabling zero-latency voice assistants.
+Orca is:
 
 - Private; All voice processing runs locally.
 - Cross-Platform:
@@ -31,14 +32,16 @@ SDKs. You can get your `AccessKey` for free. Make sure to keep your `AccessKey`
 Signup or Login to [Picovoice Console](https://console.picovoice.ai/) to get your `AccessKey`.
 
 ## Usage
+
 Orca supports two modes of operation: streaming and single synthesis.
 
-In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel. 
+In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel.
 This is demonstrated in the Orca streaming demo.
 
 In the single synthesis mode, the text is synthesized in a single call to the Orca engine.
 
 ### Streaming synthesis demo
+
 In this demo, we simulate a response from a language model by creating a text stream from a user-defined text.
 We stream that text to Orca and play the synthesized audio as soon as it gets generated.
 
@@ -49,7 +52,7 @@ orca_demo_streaming --access_key ${ACCESS_KEY} --text-to-stream ${TEXT}
 ```
 
 Replace `${ACCESS_KEY}` with your `AccessKey` obtained from Picovoice Console and `${TEXT}` with your text to be
-streamed to Orca.
+streamed to Orca. Please note that this demo was not tested on macOS.
 
 ### Single synthesis demo
 
@@ -59,6 +62,6 @@ To synthesize speech in a single call to Orca and without audio playback, run th
 orca_demo --access_key ${ACCESS_KEY} --text ${TEXT} --output_path ${WAV_OUTPUT_PATH}
 ```
 
-Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${TEXT}` with your text to be synthesized, 
-and `${WAV_OUTPUT_PATH}` with a path to a `.wav` file where the generated audio will be stored as a single-channel, 
+Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${TEXT}` with your text to be synthesized,
+and `${WAV_OUTPUT_PATH}` with a path to a `.wav` file where the generated audio will be stored as a single-channel,
 16-bit PCM `.wav` file.
diff --git a/demo/voice_assistant/README.md b/demo/voice_assistant/README.md
@@ -1,7 +1,56 @@
-# Orca Voice Assistant Demo
+# Orca Voice Assistant Demo - Talk to ChatGPT in Real-Time
 
-WIP
+Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
+
+This demo showcases how [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/) can be seamlessly integrated into LLM-applications to drastically reduce the audio latency
+of voice assistants.
+
+## Towards Zero-Latency Voice Assistants
+
+Orca can handle streaming text input, i.e., it can start
+synthesizing audio while an LLM is still producing the response.
+
+![](https://github.com/Picovoice/orca/blob/main/resources/assets/orca_streaming_animation.gif)
+
+As demonstrated above, Orca starts converting text to audio right away, while
+[OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) needs to wait for the entire
+LLM output to be available, introducing a delay in the voice assistant's response.
+
+## Technologies
+
+In this demo, the user can interact with a voice assistant in real-time by leveraging GenAI technologies.
+It is built like the majority of voice assistant today, by chaining together a Speech-to-Text engine, an LLM, and
+a Text-to-Speech engine.
+
+The following technologies are used:
+
+- Speech to Text: Picovoice's [Cheetah Streaming Speech-to-Text](https://picovoice.ai/platform/cheetah/)
+- LLM: \"ChatGPT\" using `gpt-3.5-turbo`
+  with [OpenAI Chat Completion API](https://platform.openai.com/docs/guides/text-generation)
+- TTS:
+    - Picovoice's [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/)
+    - [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech)
+
+## Compatibility
+
+This demo has been tested on Linux (x86_64) and macOS (x86_64) using Python 3.10.
+
+## Access Keys
+
+To run all features of this demo, access keys are required for:
+
+- Picovoice Console: Get your `AccessKey` for free by signing up or logging in
+  to [Picovoice Console](https://console.picovoice.ai/).
+- OpenAI API: Get your `AccessKey` by signing up or logging in to [OpenAI](https://platform.openai.com/).
+
+## Usage
 
 ```bash
-python orca_voice_assistant_demo.py --picovoice-access-key ${PV_ACCESS_KEY} --tts picovoice_orca --openai-access-key ${OPEN_AI_KEY} --llm openai
+python orca_voice_assistant_demo.py --picovoice-access-key ${PV_ACCESS_KEY} --openai-access-key ${OPEN_AI_KEY}
 ```
+
+Replace `${PV_ACCESS_KEY}` with your `AccessKey` obtained from Picovoice Console,
+`${OPEN_AI_KEY}` with your `AccessKey` obtained from OpenAI.
+You can toggle between Orca and OpenAI TTS by using the `--tts` flag, using `picovoice_orca` or `openai`, respectively.
+If you don't want to use ChatGPT, set the `--llm` flag to `dummy`.
+This will simulate an LLM response using example sentences that are synthesized by the TTS system.
diff --git a/demo/voice_assistant/orca_voice_assistant_demo.py b/demo/voice_assistant/orca_voice_assistant_demo.py
@@ -213,7 +213,7 @@ def main(args: argparse.Namespace) -> None:
 
     parser.add_argument(
         "--llm",
-        default=LLMs.DUMMY.value,
+        default=LLMs.OPENAI.value,
         choices=[llm.value for llm in LLMs],
         help="Choose LLM to use")
     parser.add_argument(

diff --git a/resources/assets/orca_streaming_animation.gif b/resources/assets/orca_streaming_animation.gif