Skip to content

Commit

Permalink
update readmes and workflows
Browse files Browse the repository at this point in the history
  • Loading branch information
bejager committed May 3, 2024
1 parent 66305ed commit d1e71c6
Show file tree
Hide file tree
Showing 8 changed files with 76 additions and 19 deletions.
10 changes: 7 additions & 3 deletions .github/workflows/c-demos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,12 @@ jobs:
run: cmake -G "${{ matrix.make_file }}" -B ./build

- name: Build demo
run: cmake --build ./build --target orca_demo
run: |
cmake --build ./build --target orca_demo
cmake --build ./build --target orca_demo_streaming
- name: Test
run: python test/test_orca_c.py ${{secrets.PV_VALID_ACCESS_KEY}} ${{ matrix.platform }} ${{ matrix.arch }}
run: python3 test/test_orca_c.py ${{secrets.PV_VALID_ACCESS_KEY}} ${{ matrix.platform }} ${{ matrix.arch }}

build-demo-self-hosted:
runs-on: ${{ matrix.machine }}
Expand Down Expand Up @@ -106,7 +108,9 @@ jobs:
run: cmake -B ./build

- name: Build demo
run: cmake --build ./build --target orca_demo
run: |
cmake --build ./build --target orca_demo
cmake --build ./build --target orca_demo_streaming
- name: Test
run: python3 test/test_orca_c.py ${{secrets.PV_VALID_ACCESS_KEY}} ${{ matrix.platform }} ${{ matrix.arch }}
7 changes: 4 additions & 3 deletions .github/workflows/python-demo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ jobs:
install_dep: sudo apt install libportaudio2
- os: windows-latest
- os: macos-latest
install_dep: brew install portaudio --HEAD

steps:
- uses: actions/checkout@v3
Expand All @@ -42,12 +43,12 @@ jobs:
python-version: ${{ matrix.python-version }}

- name: Pre-build dependencies
run: python -m pip install --upgrade pip
run: python3 -m pip install --upgrade pip

# TODO: remove after release
- name: Build dependencies
run: |
python -m pip install -U pip setuptools
python3 -m pip install -U pip setuptools
pip install wheel
cd ../../binding/python
python3 setup.py sdist bdist_wheel
Expand All @@ -66,7 +67,7 @@ jobs:
- name: Test single
run: >
python orca_demo.py
python3 orca_demo.py
--access_key ${{secrets.PV_VALID_ACCESS_KEY}}
--text "Hello, I am Orca!"
--output_path ./tmp.wav
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python-perf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ jobs:
python-version: '3.10'

- name: Pre-build dependencies
run: python -m pip install --upgrade pip
run: python3 -m pip install --upgrade pip

- name: Install dependencies
run: pip install -r requirements.txt
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,13 @@ jobs:
python-version: ${{ matrix.python-version }}

- name: Pre-build dependencies
run: python -m pip install --upgrade pip
run: python3 -m pip install --upgrade pip

- name: Install dependencies
run: pip install -r requirements.txt

- name: Test
run: python test_orca.py --access-key ${{secrets.PV_VALID_ACCESS_KEY}}
run: python3 test_orca.py --access-key ${{secrets.PV_VALID_ACCESS_KEY}}

build-self-hosted:
runs-on: ${{ matrix.machine }}
Expand Down
15 changes: 9 additions & 6 deletions demo/python/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Orca Text-to-Speech Engine Demo
# Orca Text-to-Speech Engine Python Demo

Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)

## Orca

Orca is an on-device text-to-speech engine producing high-quality, realistic, spoken audio with zero latency. Orca is:
Orca is an on-device text-to-speech engine designed for use with LLMs, enabling zero-latency voice assistants.
Orca is:

- Private; All voice processing runs locally.
- Cross-Platform:
Expand All @@ -31,14 +32,16 @@ SDKs. You can get your `AccessKey` for free. Make sure to keep your `AccessKey`
Signup or Login to [Picovoice Console](https://console.picovoice.ai/) to get your `AccessKey`.

## Usage

Orca supports two modes of operation: streaming and single synthesis.

In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel.
In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel.
This is demonstrated in the Orca streaming demo.

In the single synthesis mode, the text is synthesized in a single call to the Orca engine.

### Streaming synthesis demo

In this demo, we simulate a response from a language model by creating a text stream from a user-defined text.
We stream that text to Orca and play the synthesized audio as soon as it gets generated.

Expand All @@ -49,7 +52,7 @@ orca_demo_streaming --access_key ${ACCESS_KEY} --text-to-stream ${TEXT}
```

Replace `${ACCESS_KEY}` with your `AccessKey` obtained from Picovoice Console and `${TEXT}` with your text to be
streamed to Orca.
streamed to Orca. Please note that this demo was not tested on macOS.

### Single synthesis demo

Expand All @@ -59,6 +62,6 @@ To synthesize speech in a single call to Orca and without audio playback, run th
orca_demo --access_key ${ACCESS_KEY} --text ${TEXT} --output_path ${WAV_OUTPUT_PATH}
```

Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${TEXT}` with your text to be synthesized,
and `${WAV_OUTPUT_PATH}` with a path to a `.wav` file where the generated audio will be stored as a single-channel,
Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${TEXT}` with your text to be synthesized,
and `${WAV_OUTPUT_PATH}` with a path to a `.wav` file where the generated audio will be stored as a single-channel,
16-bit PCM `.wav` file.
55 changes: 52 additions & 3 deletions demo/voice_assistant/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,56 @@
# Orca Voice Assistant Demo
# Orca Voice Assistant Demo - Talk to ChatGPT in Real-Time

WIP
Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)

This demo showcases how [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/) can be seamlessly integrated into LLM-applications to drastically reduce the audio latency
of voice assistants.

## Towards Zero-Latency Voice Assistants

Orca can handle streaming text input, i.e., it can start
synthesizing audio while an LLM is still producing the response.

![](https://github.com/Picovoice/orca/blob/main/resources/assets/orca_streaming_animation.gif)

As demonstrated above, Orca starts converting text to audio right away, while
[OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) needs to wait for the entire
LLM output to be available, introducing a delay in the voice assistant's response.

## Technologies

In this demo, the user can interact with a voice assistant in real-time by leveraging GenAI technologies.
It is built like the majority of voice assistant today, by chaining together a Speech-to-Text engine, an LLM, and
a Text-to-Speech engine.

The following technologies are used:

- Speech to Text: Picovoice's [Cheetah Streaming Speech-to-Text](https://picovoice.ai/platform/cheetah/)
- LLM: \"ChatGPT\" using `gpt-3.5-turbo`
with [OpenAI Chat Completion API](https://platform.openai.com/docs/guides/text-generation)
- TTS:
- Picovoice's [Orca Streaming Text-to-Speech](https://picovoice.ai/platform/orca/)
- [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech)

## Compatibility

This demo has been tested on Linux (x86_64) and macOS (x86_64) using Python 3.10.

## Access Keys

To run all features of this demo, access keys are required for:

- Picovoice Console: Get your `AccessKey` for free by signing up or logging in
to [Picovoice Console](https://console.picovoice.ai/).
- OpenAI API: Get your `AccessKey` by signing up or logging in to [OpenAI](https://platform.openai.com/).

## Usage

```bash
python orca_voice_assistant_demo.py --picovoice-access-key ${PV_ACCESS_KEY} --tts picovoice_orca --openai-access-key ${OPEN_AI_KEY} --llm openai
python orca_voice_assistant_demo.py --picovoice-access-key ${PV_ACCESS_KEY} --openai-access-key ${OPEN_AI_KEY}
```

Replace `${PV_ACCESS_KEY}` with your `AccessKey` obtained from Picovoice Console,
`${OPEN_AI_KEY}` with your `AccessKey` obtained from OpenAI.
You can toggle between Orca and OpenAI TTS by using the `--tts` flag, using `picovoice_orca` or `openai`, respectively.
If you don't want to use ChatGPT, set the `--llm` flag to `dummy`.
This will simulate an LLM response using example sentences that are synthesized by the TTS system.
2 changes: 1 addition & 1 deletion demo/voice_assistant/orca_voice_assistant_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ def main(args: argparse.Namespace) -> None:

parser.add_argument(
"--llm",
default=LLMs.DUMMY.value,
default=LLMs.OPENAI.value,
choices=[llm.value for llm in LLMs],
help="Choose LLM to use")
parser.add_argument(
Expand Down
Binary file added resources/assets/orca_streaming_animation.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d1e71c6

Please sign in to comment.