Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Deepgram Synthesizer. #618

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
4ed9a2f
Added Deepgram Synthesizer.
ZeeshanLone Jul 6, 2024
6598324
Fix action worker twilio sid capture (#619)
rjheeta Jul 8, 2024
6198f7b
add livekit docs (#621)
Kian1354 Jul 8, 2024
18761bb
phrase trigger matcher returns agent config instead of type (#622)
adnaans Jul 9, 2024
48cba3e
adds twilio dtmf action (#623)
ajar98 Jul 9, 2024
ad1adc8
[Bug #628] correct coding errors in the google synthesiser (#629)
jstahlbaum-fibernetics Jul 12, 2024
4c196e1
[DOW-119] creates AudioPipeline abstraction (#625)
ajar98 Jul 12, 2024
e1cf228
update script (#635)
ajar98 Jul 12, 2024
77d6593
convert logger.error to logger.warning (#636)
ajar98 Jul 15, 2024
32f0cb4
update docs (#639)
ajar98 Jul 15, 2024
3dc1d49
[ESUP-55] adds # and * support and also ability to press multiple but…
ajar98 Jul 16, 2024
6dee7c5
Upgrade cartesia to 1.0.7 and add support for continuations (#646)
sauhardjain Jul 18, 2024
1ac820f
Typo in the word using (#647)
petertimwalker Jul 19, 2024
821fe26
Remove unnecessary quotation marks (#644)
tashbenbetov Jul 19, 2024
a971ebc
Fix the error in the URL (#643)
tashbenbetov Jul 19, 2024
eaafc1b
support additional headers in external actions requester (#661)
ajar98 Jul 24, 2024
d8f8aca
Custom provider errors and add StreamingConversation to transcriber a…
adnaans Jul 29, 2024
11e8d24
Improve Cartesia Synthesizer error handling (#663)
sauhardjain Jul 29, 2024
3bc8f8d
Update agents.mdx (#664)
ajar98 Jul 29, 2024
adf0d87
poetry version prerelease (#665)
ajar98 Jul 30, 2024
2ecc631
fix pinecone lint (#679)
ajar98 Aug 7, 2024
c1148e3
Include cartesia's voice controls on docs + update synthesizer (#674)
sauhardjain Aug 7, 2024
6e6f37a
poetry version prerelease (#680)
ajar98 Aug 7, 2024
153ebf6
Added Deepgram Synthesizer.
ZeeshanLone Jul 6, 2024
aab5b46
Merge branch 'deepgram_synthesizer' of https://github.com/ZeeshanLone…
ZeeshanLone Aug 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions apps/livekit/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
LIVEKIT_WS_URL=your_livekit_ws_url
OPENAI_API_KEY=your_openai_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
2 changes: 1 addition & 1 deletion docs/agents.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ agent behavior:
- `language` sets the agent language (for more context see [Multilingual Agents](/multilingual))
- `initial_message` controls the agents first utterance.
- `initial_message_delay` adds a delay to the initial message from when the call begins
- `ask_if_human_present_on_idle` allows the agent to speak when there is more than 4s of silence on the call
- `ask_if_human_present_on_idle` allows the agent to speak when there is more than 15s of silence on the call
- `llm_temperature` controls the behavior of the underlying language model. Values can range from 0 to 1, with higher
values leading to more diverse and creative results. Lower values generate more consistent outputs.
2 changes: 1 addition & 1 deletion docs/configuring-number.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ by modifying:

### Voice

First, let's create a new voice via [ElevenLabs]("https://elevenlabs.io) and grab the voice ID.
First, let's create a new voice via [ElevenLabs](https://elevenlabs.io) and grab the voice ID.

```
voice = vocode_client.voices.create_voice(
Expand Down
4 changes: 2 additions & 2 deletions docs/external-actions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -207,12 +207,12 @@ Vocode expects responses from the user’s API in JSON in the following format:

```python
Response {
result: Any
result: Dict[str, Any]
agent_message: Optional[str] = None
}
```

- `result` is a payload containing the result of the action on the user’s side, and can be in any format
- `result` is a payload containing the result of the action on the user’s side, and can have any schema
- `agent_message` optionally contains a message that will be synthesized into audio and sent back to the phone call (see [Configuring the External Action](/external-actions#configuring-the-external-action) above for more info)

In the [Meeting Assistant Example](/external-actions#meeting-assistant-example) below, the user’s API could return back a JSON response that looks like:
Expand Down
2 changes: 1 addition & 1 deletion docs/hosted-quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -81,5 +81,5 @@ If you'd prefer to hit our API directly, take a look at our [API Reference](/api

# Hosted Walkthrough

Once you have Vocode installed, we suggest going through the [Hosted Walkthrough](/getting-number) which will
Once you have Vocode installed, we suggest going through the [Hosted Walkthrough](/walkthrough_intro) which will
show you how to start interacting with the API.
Binary file added docs/images/livekit_keys.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
"open-source/python-quickstart",
"open-source/react-quickstart",
"open-source/telephony",
"open-source/livekit-webrtc",
"open-source/turn-based-conversation"
]
},
Expand Down
45 changes: 45 additions & 0 deletions docs/open-source/livekit-webrtc.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "Using WebRTC with LiveKit"
description: "Deploy your Vocode Agents using WebRTC"
---

# Overview

[WebRTC](https://webrtc.org/) is an alternative to websockets for real-time P2P communication. Vocode Agents are compatible with both WebRTC and websockets, enabling developers to pick
the stack best suited for their application.

To connect Vocode agents to WebRTC, Vocode uses [LiveKit](https://livekit.io/)–an open source platform for building on WebRTC. For a background on how LiveKit
works, please see their [documentation](https://docs.livekit.io/home/get-started/intro-to-livekit/).

In this guide, we'll be walking through how to connect a Vocode Agent to the [LiveKit Agents Playground](https://agents-playground.livekit.io/).

# Walkthrough: hooking up a Vocode Agent to a LiveKit Room

## Setting up your LiveKit Server

First, you'll want to set up a LiveKit Server for your Agent. For simplicity, we are using LiveKit's hosted offering–but it can also be self hosted, since LiveKit is open source!

In our LiveKit dashboard, we first generate our websocket URL, API key, and Secret Key.

![Setup](/images/livekit_keys.png)

## Deploying your Vocode agent to a LiveKit Room

Once you have your LiveKit Server credentials, we can hook it up to Vocode via the `LiveKitConversation` abstraction. Using the starter code in
[vocode-core/apps/livekit/app.py](https://github.com/vocodedev/vocode-core/blob/main/apps/livekit/app.py), you can quickly deploy a Vocode Agent to accept
new job requests.

Fill in your credentials in `.env`:

```bash

LIVEKIT_SERVER_URL=wss://your-livekit-ws-url.livekit.cloud
LIVEKIT_API_KEY="KEY"
LIVEKIT_API_SECRET="SECRET"
```

Followed by
`poetry run python app.py dev`

And now you can connect to the [Agents Playground](https://agents-playground.livekit.io/) to interact with your agent. With LiveKit, you can connect Vocode
agents to any web application and leverage their [React Component](https://docs.livekit.io/reference/components/react/) library.
48 changes: 40 additions & 8 deletions docs/open-source/using-synthesizers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@ Vocode currently supports the following synthesizers:

1. Azure (Microsoft)
2. Google
3. Eleven Labs
4. Rime
5. Play.ht
6. GTTS (Google Text-to-Speech)
7. Stream Elements
8. Bark
9. Amazon Polly
3. Cartesia
4. Eleven Labs
5. Rime
6. Play.ht
7. GTTS (Google Text-to-Speech)
8. Stream Elements
9. Bark
10. Amazon Polly

These synthesizers are defined using their respective configuration classes, which are subclasses of the `SynthesizerConfig` class.

Expand Down Expand Up @@ -83,7 +84,38 @@ synthesizer_config=PlayHtSynthesizerConfig.from_telephone_output_device(
...
```

### Example 2: Using Azure in StreamingConversation locally
### Example 2: Using Cartesia's streaming synthesizer

We support Cartesia's [low-latency streaming API](https://docs.cartesia.ai/api-reference/endpoints/stream-speech-websocket) enabled by WebSockets. You can use the `CartesiaSynthesizer` with the `CartesiaSynthesizerConfig` to enable this feature.

#### Telephony

```python
synthesizer_config=CartesiaSynthesizerConfig.from_telephone_output_device(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id=os.getenv("CARTESIA_VOICE_ID"),
)
```

In this example, the `CartesiaSynthesizerConfig.from_output_device()` method is used to create a configuration object for the Cartesia synthesizer.
The method takes a `speaker_output` object as an argument, and extracts the `sampling_rate` and `audio_encoding` from the output device.

#### Controlling Speed & Emotions

You can set the `speed` and `emotion` parameters in the `CartesiaSynthesizerConfig` object to control the speed and emotions of the agent's voice! See [this page](https://docs.cartesia.ai/user-guides/voice-control) for more details.

```python
CartesiaSynthesizerConfig(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id=os.getenv("CARTESIA_VOICE_ID"),
experimental_voice_controls={
"speed": "slow",
"emotion": "positivity: high"
}
)
```

### Example 3: Using Azure in StreamingConversation locally

```python
from vocode.streaming.models.synthesizer import AzureSynthesizerConfig
Expand Down
4 changes: 2 additions & 2 deletions docs/walkthrough_intro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ description: "Setting up a simple receptionist agent"
Welcome to the Vocode API! We've got a lot of powerful features that we're going to illustrate
by setting up a receptionist agent that can take calls and book calendar appointments.

We'll cover how to do it step-by-step entirely via API or you could also follow along usig our
[Dashboard]("https://dashboard.vocode.dev).
We'll cover how to do it step-by-step entirely via API or you could also follow along using our
[Dashboard](https://dashboard.vocode.dev).

In particular, we'll go through the following steps:

Expand Down
4 changes: 2 additions & 2 deletions playground/streaming/agent/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ async def sender():

await asyncio.gather(receiver(), sender())
if actions_worker is not None:
actions_worker.terminate()
await actions_worker.terminate()


async def agent_main():
Expand Down Expand Up @@ -233,7 +233,7 @@ async def agent_main():
try:
await run_agent(agent, interruption_probability=0, backchannel_probability=0)
except KeyboardInterrupt:
agent.terminate()
await agent.terminate()


if __name__ == "__main__":
Expand Down
Loading