Skip to content

Commit

Permalink
local audio wip, maybe mvp
Browse files Browse the repository at this point in the history
still some issues cancelling the bot
but other than that, it works with headphones!

headphone notes
  • Loading branch information
vipyne committed Dec 13, 2024
1 parent 0660836 commit cdf0a78
Showing 1 changed file with 271 additions and 0 deletions.
271 changes: 271 additions & 0 deletions 002-hello-pipecat-nim-local-audio.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1db60caf-a890-4e62-8255-62fd691cd6e6",
"metadata": {},
"source": [
"# Voice AI Agents: Conversational AI Framework for the Enterprise\n",
"In this notebook, we walk through how to craft and deploy a voice AI bot using Pipecat AI. We illustrate the basic Pipecat flow with the `nvidia/llama-3.1-nemotron-70b-instruct` LLM model and Riva for STT (Speech-To-Text) & TTS (Text-To-Speech). However, Pipecat is not opinionated and other models and STT/TTS services can easily be used. See [Pipecat documentation](https://docs.pipecat.ai/server/services/supported-services#supported-services) for other supported services.\n",
"\n",
"Pipecat AI is an open-source framework for building voice and multimodal conversational agents. Pipecat simplifies the complex voice-to-voice AI pipeline, and lets developers build AI capabilities easily and with Open Source, commercial, and custom models. See [Pipecat's Core Concepts](https://docs.pipecat.ai/getting-started/core-concepts) for a deep dive into how it works.\n",
"\n",
"The framework was developed by Daily, a company that has provided real-time video and audio communication infrastructure since 2016. It is fully vendor neutral and is not tightly coupled to Daily's infrastructure.\n",
"\n",
"> ## 🤖🎧 Use headphones for this demo! 🎧🤖"
]
},
{
"cell_type": "markdown",
"id": "9b4fa7d7-88fb-4b33-8145-ee1a91e58af1",
"metadata": {},
"source": [
"## Step 1 - Install dependencies\n",
"First we set our environment.\n",
"\n",
"We use Daily for transport, OpenAI for context aggregation, Riva for TTS & TTS, and Silero for VAD (Voice Activity Detection). If using different services, for example Cartesia for TTS, one would run `pip install pipecat-ai[cartesia]`.\n",
"\n",
"> [Development note]: We're installing from the github main branch here to ensure we have the latest improvements. By the time we address feedback we'll have a new release of Pipecat and just install the pipecat parts we are using."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "718d7f76-bb78-4614-ab77-229ed3eea402",
"metadata": {},
"outputs": [],
"source": [
"!pip install python-dotenv\n",
"%load_ext dotenv\n",
"%dotenv\n",
"\n",
"!pip install \"git+https://github.com/pipecat-ai/pipecat.git@main\"\n",
"# !pip install \"pipecat-ai[daily,local,openai,riva,silero]\""
]
},
{
"cell_type": "markdown",
"id": "7979c5d1-97a9-42e7-9de2-88b7d31b1409",
"metadata": {},
"source": [
"## Step 2 - Configure local audio transport for WebRTC communication\n",
"- Enable audio input and output for text-to-speech playback and enable VAD"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6136649d-2d26-4ca0-93da-6f6626e97c32",
"metadata": {},
"outputs": [],
"source": [
"from pipecat.audio.vad.silero import SileroVADAnalyzer\n",
"from pipecat.transports.base_transport import TransportParams\n",
"from pipecat.transports.local.audio import LocalAudioTransport\n",
"\n",
"transport = LocalAudioTransport(\n",
" TransportParams(\n",
" audio_out_enabled=True,\n",
" audio_in_enabled=True,\n",
" vad_enabled=True,\n",
" vad_analyzer=SileroVADAnalyzer(),\n",
" vad_audio_passthrough=True,\n",
" audio_out_is_live=True,\n",
" )\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "8506527e-b84c-49e1-8af4-223fdb33f582",
"metadata": {},
"source": [
"## Step 3 - Initialize LLM, STT, and TTS services\n",
"We can customize options, for example a different LLM `model` or `voice_id` for FastPitch TTS."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "623d77d5-c183-43d0-980d-fd99a2836365",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from pipecat.services.nim import NimLLMService\n",
"from pipecat.services.riva import FastPitchTTSService, ParakeetSTTService\n",
"\n",
"stt = ParakeetSTTService(api_key=os.getenv(\"NVIDIA_API_KEY\"))\n",
"\n",
"llm = NimLLMService(\n",
" api_key=os.getenv(\"NVIDIA_API_KEY\"), model=\"meta/llama-3.1-70b-instruct\"\n",
")\n",
"\n",
"tts = FastPitchTTSService(api_key=os.getenv(\"NVIDIA_API_KEY\"))"
]
},
{
"cell_type": "markdown",
"id": "ac150732-cbb4-4c70-8d31-cab5ae51b5fb",
"metadata": {},
"source": [
"## Step 4 - Define prompt and initialize context aggregator\n",
"Edit the prompt as desired."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0d884775-c4c0-49eb-b502-d4c855cc8e3b",
"metadata": {},
"outputs": [],
"source": [
"from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext\n",
"\n",
"messages = [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way that makes a cat pun if it is possible.\",\n",
" },\n",
"]\n",
"\n",
"context = OpenAILLMContext(messages)\n",
"context_aggregator = llm.create_context_aggregator(context)"
]
},
{
"cell_type": "markdown",
"id": "0752c614-a65d-4c61-965f-26d7b46f8153",
"metadata": {},
"source": [
"## Step 5 - Create pipeline\n",
"Here we align the services into a pipeline to process speech into text, send to llm, then turn the llm response text into speech."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f8620a2-4caa-40c5-88d9-8aca2743157e",
"metadata": {},
"outputs": [],
"source": [
"from pipecat.pipeline.pipeline import Pipeline\n",
"\n",
"pipeline = Pipeline(\n",
" [\n",
" transport.input(), # Transport user input\n",
" stt, # STT\n",
" context_aggregator.user(), # User responses\n",
" llm, # LLM\n",
" tts, # TTS\n",
" transport.output(), # Transport bot output\n",
" context_aggregator.assistant(), # Assistant spoken responses\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ad9c588f-0c00-4414-984a-33da31e2803d",
"metadata": {},
"source": [
"## Step 6 - Create PipelineTask"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fbadb9a-9778-4f0f-910f-5c53d117e593",
"metadata": {},
"outputs": [],
"source": [
"from pipecat.pipeline.task import PipelineParams, PipelineTask\n",
"\n",
"task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))"
]
},
{
"cell_type": "markdown",
"id": "b4890ce7-6a1a-4f39-b6af-9a3335ad9fcf",
"metadata": {},
"source": [
"## Step 7 - Create a pipeline runner\n",
"This manages the processing pipeline."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87e504ab-b889-4b6a-96a1-159d42a95833",
"metadata": {},
"outputs": [],
"source": [
"from pipecat.pipeline.runner import PipelineRunner\n",
"\n",
"runner = PipelineRunner()"
]
},
{
"cell_type": "markdown",
"id": "08998f8d-ac33-4b38-b10a-01691f81636a",
"metadata": {},
"source": [
"## Step 8 - Run the bot and say \"hello\"!\n",
"\n",
"The first time you run the bot, it will load weights for a voice activity model into the local Python process. This will take 10-15 seconds. \n",
"The bot will wait for you to speak first. \n",
"\n",
"> ### 🎧 Remember to use headphones!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92a411cb-d2c8-4446-be69-b391486e853e",
"metadata": {},
"outputs": [],
"source": [
"await runner.run(task)"
]
},
{
"cell_type": "markdown",
"id": "910007a5-7800-493d-b5ec-e3bb1442cac1",
"metadata": {},
"source": [
"## Step 9: Stop the bot"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a9fe93f-f00f-42b3-b7d3-7497c1649a43",
"metadata": {},
"outputs": [],
"source": [
"await runner.cancel()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python3.12",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit cdf0a78

Please sign in to comment.