Skip to content

Commit

Permalink
(docs) Adds notes about dependencies for audio playback
Browse files Browse the repository at this point in the history
  • Loading branch information
aleph-ra committed Sep 5, 2024
1 parent 923f6ee commit 4ecbacf
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions docs/examples/conversational.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ text_query = Topic(name="text0", msg_type="String")
s2t_config = SpeechToTextConfig(enable_vad=True) # option to always listen for speech through the microphone
```
```{note}
With **enable_vad** set to **True**, the component automatically deploys [Silero-VAD](https://github.com/snakers4/silero-vad) by default in ONNX format. This model has a small footprint and can be easily deployed on the edge. However we need to install a couple of dependencies for this to work. These can be installed with `pip install torchaudio onnxruntime`
With **enable_vad** set to **True**, the component automatically deploys [Silero-VAD](https://github.com/snakers4/silero-vad) by default in ONNX format. This model has a small footprint and can be easily deployed on the edge. However we need to install a couple of dependencies for this to work. These can be installed with: `pip install pyaudio torchaudio onnxruntime`
```

To initialize the component we also need a model client for a speech to text model. We will be using the HTTP client for RoboML for this purpose.
Expand All @@ -32,7 +32,7 @@ To initialize the component we also need a model client for a speech to text mod
RoboML is a aggregator library that provides a model serving aparatus for locally serving opensource ML models useful in robotics. Learn about setting up RoboML [here](https://www.github.com/automatika-robotics/roboml).
```

Additionally, we will use the client with a model called, Whisper, a popular opensource speech to text model from OpenAI for the purpose of this example. Lets see what the looks like in code.
Additionally, we will use the client with a model called, Whisper, a popular opensource speech to text model from OpenAI. Lets see what the looks like in code.

```python
from agents.clients.roboml import HTTPModelClient
Expand Down Expand Up @@ -96,6 +96,10 @@ Notice that the template is a jinja2 template string, where the actual name of t
## TextToSpeech Component
The TextToSpeech component setup will be very similar to the SpeechToText component. We will once again use a RoboML client, this time with the SpeechT5 model (opensource model from Microsoft). Furthermore, this component can be configured to play audio on a playback device available onboard the robot. We will utilize this option through our config. An output topic is optional for this component as we will be playing the audio directly on device.

```{note}
In order to utilize _play_on_device_ you need to install a couple of dependencies as follows: `pip install soundfile sounddevice`
```

```python
from agents.config import TextToSpeechConfig
from agents.models import SpeechT5
Expand Down

0 comments on commit 4ecbacf

Please sign in to comment.