Skip to content

New READMEs #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions basics/uninterruptable/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Uninterruptable Agent

A voice assistant that demonstrates non-interruptible speech behavior using LiveKit's voice agents, useful for delivering information without interruption.

## Overview

**Uninterruptable Agent** - A voice-enabled assistant configured to complete its responses without being interrupted by user speech, demonstrating the `allow_interruptions=False` configuration option.

## Features

- **Simple Configuration**: Single parameter controls interruption behavior
- **Voice-Enabled**: Built using LiveKit's voice capabilities with support for:
- Speech-to-Text (STT) using Deepgram
- Large Language Model (LLM) using OpenAI GPT-4o
- Text-to-Speech (TTS) using OpenAI
- Voice Activity Detection (VAD) disabled during agent speech

## How It Works

1. User connects to the LiveKit room
2. Agent automatically starts speaking a long test message
3. User attempts to interrupt by speaking
4. Agent continues speaking without stopping
5. Only after the agent finishes can the user's input be processed
6. Subsequent responses are also uninterruptible

## Prerequisites

- Python 3.10+
- `livekit-agents`>=1.0
- LiveKit account and credentials
- API keys for:
- OpenAI (for LLM and TTS capabilities)
- Deepgram (for speech-to-text)

## Installation

1. Clone the repository

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Create a `.env` file in the parent directory with your API credentials:
```
LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
OPENAI_API_KEY=your_openai_key
DEEPGRAM_API_KEY=your_deepgram_key
```

## Running the Agent

```bash
python uninterruptable.py dev
```

The agent will immediately start speaking a long message. Try interrupting to observe the non-interruptible behavior.

## Architecture Details

### Key Configuration

The critical setting that makes this agent uninterruptible:

```python
Agent(
instructions="...",
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=openai.TTS(),
allow_interruptions=False # This prevents interruptions
)
```

### Behavior Comparison

| Setting | User Speaks While Agent Talks | Result |
|---------|------------------------------|---------|
| `allow_interruptions=True` (default) | Agent stops mid-sentence | User input processed immediately |
| `allow_interruptions=False` | Agent continues speaking | User input queued until agent finishes |

### Testing Approach

The agent automatically generates a long response on entry to facilitate testing:
```python
self.session.generate_reply(user_input="Say something somewhat long and boring so I can test if you're interruptable.")
```

## Use Cases

### When to Use Uninterruptible Agents

1. **Legal Disclaimers**: Must be read in full without interruption
2. **Emergency Instructions**: Critical safety information
3. **Tutorial Steps**: Sequential instructions that shouldn't be skipped
4. **Terms and Conditions**: Required complete playback


## Implementation Patterns

### Selective Non-Interruption

```python
# Make only critical messages uninterruptible
async def say_critical(self, message: str):
self.allow_interruptions = False
await self.session.say(message)
self.allow_interruptions = True
```

## Important Considerations

- **User Experience**: Non-interruptible agents can be frustrating if overused
- **Message Length**: Keep uninterruptible segments reasonably short
- **Clear Indication**: Consider informing users when interruption is disabled
- **Fallback Options**: Provide alternative ways to skip or pause if needed

## Example Interaction

```
Agent: [Starts long message] "I'm going to tell you a very long and detailed story about..."
User: "Stop!" [Agent continues]
Agent: "...and that's why the chicken crossed the road. The moral of the story is..."
User: "Hey, wait!" [Agent still continues]
Agent: "...patience is a virtue." [Finally finishes]
User: "Finally! Can you hear me now?"
Agent: "Yes, I can hear you now. How can I help?"
```
File renamed without changes.
89 changes: 89 additions & 0 deletions pipeline-stt/keyword-detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
## Overview

**Keyword Detection Agent** - A voice-enabled agent that monitors user speech for predefined keywords and logs when they are detected.

## Features

- **Real-time Keyword Detection**: Monitors speech for specific keywords as users talk
- **Custom STT Pipeline**: Intercepts the speech-to-text pipeline to detect keywords
- **Logging System**: Logs detected keywords with proper formatting
- **Voice-Enabled**: Built using voice capabilities with support for:
- Speech-to-Text (STT) using Deepgram
- Large Language Model (LLM) using OpenAI
- Text-to-Speech (TTS) using OpenAI
- Voice Activity Detection (VAD) using Silero

## How It Works

1. User connects to the LiveKit room
2. Agent greets the user and starts a conversation
3. As the user speaks, the custom STT pipeline monitors for keywords
4. When keywords like "Shane", "hello", "thanks", or "bye" are detected, they are logged
5. The agent continues normal conversation while monitoring in the background
6. All speech continues to be processed by the LLM for responses

## Prerequisites

- Python 3.10+
- `livekit-agents`>=1.0
- LiveKit account and credentials
- API keys for:
- OpenAI (for LLM and TTS capabilities)
- Deepgram (for speech-to-text)

## Installation

1. Clone the repository

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Create a `.env` file in the parent directory with your API credentials:
```
LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
OPENAI_API_KEY=your_openai_key
DEEPGRAM_API_KEY=your_deepgram_key
```

## Running the Agent

```bash
python keyword_detection.py console
```

The agent will start a conversation and monitor for keywords in the background. Try using words like "hello", "thanks", or "bye" in your speech and watch them come up in logging.

## Architecture Details

### Main Classes

- **KeywordDetectionAgent**: Custom agent class that extends the base Agent with keyword detection
- **stt_node**: Overridden method that intercepts the STT pipeline to monitor for keywords

### Keyword Detection Pipeline

The agent overrides the `stt_node` method to create a custom processing pipeline:
1. Receives the parent STT stream
2. Monitors final transcripts for keywords
3. Logs detected keywords
4. Passes all events through unchanged for normal processing

### Current Keywords

The agent monitors for these keywords (case-insensitive):
- "Shane"
- "hello"
- "thanks"
- "bye"

### Logging Output

When keywords are detected, you'll see log messages like:
```
INFO:keyword-detection:Keyword detected: 'hello'
INFO:keyword-detection:Keyword detected: 'thanks'
```
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@

load_dotenv(dotenv_path=Path(__file__).parent.parent / '.env')

logger = logging.getLogger("listen-and-respond")
logger = logging.getLogger("keyword-detection")
logger.setLevel(logging.INFO)

class SimpleAgent(Agent):
class KeywordDetectionAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""
You are a helpful agent.
You are a helpful agent that detects keywords in user speech.
""",
stt=deepgram.STT(),
llm=openai.LLM(),
Expand All @@ -28,7 +28,7 @@ async def on_enter(self):
self.session.generate_reply()

async def stt_node(self, text: AsyncIterable[str], model_settings: Optional[dict] = None) -> Optional[AsyncIterable[rtc.AudioFrame]]:
keywords = ["Shane", "hello", "thanks"]
keywords = ["Shane", "hello", "thanks", "bye"]
parent_stream = super().stt_node(text, model_settings)

if parent_stream is None:
Expand All @@ -53,7 +53,7 @@ async def entrypoint(ctx: JobContext):
session = AgentSession()

await session.start(
agent=SimpleAgent(),
agent=KeywordDetectionAgent(),
room=ctx.room
)

Expand Down
85 changes: 85 additions & 0 deletions pipeline-stt/transcriber/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Transcriber Agent

A speech-to-text logging agent that transcribes user speech and saves it to a file using LiveKit's voice agents.

## Overview

**Transcriber Agent** - A voice-enabled agent that listens to user speech, transcribes it using Deepgram STT, and logs all transcriptions with timestamps to a local file.

## Features

- **Real-time Transcription**: Converts speech to text as users speak
- **Persistent Logging**: Saves all transcriptions to `user_speech_log.txt` with timestamps
- **Voice-Enabled**: Built using LiveKit's voice capabilities with support for:
- Speech-to-Text (STT) using Deepgram
- Minimal agent configuration without LLM or TTS
- **Event-Based Processing**: Uses the `user_input_transcribed` event for efficient transcript handling
- **Automatic Timestamping**: Each transcription entry includes date and time

## How It Works

1. User connects to the LiveKit room
2. Agent starts listening for speech input
3. Deepgram STT processes the audio stream in real-time
4. When a final transcript is ready, it triggers the `user_input_transcribed` event
5. The transcript is appended to `user_speech_log.txt` with a timestamp
6. The process continues for all subsequent speech

## Prerequisites

- Python 3.10+
- `livekit-agents`>=1.0
- LiveKit account and credentials
- API keys for:
- Deepgram (for speech-to-text)

## Installation

1. Clone the repository

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Create a `.env` file in the parent directory with your API credentials:
```
LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
DEEPGRAM_API_KEY=your_deepgram_key
```

## Running the Agent

```bash
python transcriber.py console
```

The agent will start listening for speech and logging transcriptions to `user_speech_log.txt` in the current directory.

## Architecture Details

### Main Components

- **AgentSession**: Manages the agent lifecycle and event handling
- **user_input_transcribed Event**: Fired when Deepgram completes a transcription
- **Transcript Object**: Contains the transcript text and finality status

### Log File Format

Transcriptions are saved in the following format:
```
[2024-01-15 14:30:45] Hello, this is my first transcription
[2024-01-15 14:30:52] Testing the speech to text functionality
```

### Minimal Agent Configuration

This agent uses a minimal configuration without LLM or TTS:
```python
Agent(
instructions="You are a helpful assistant that transcribes user speech to text.",
stt=deepgram.STT()
)
```
File renamed without changes.
Loading