A tool that converts PDF content into a 2-host podcast using LLM and TTS APIs.
Podcast Creator is a comprehensive solution for transforming PDF documents and Arxiv papers into engaging, two-host podcasts. It leverages advanced LLM (Large Language Model) and TTS (Text-to-Speech) technologies to generate natural-sounding conversations between virtual hosts. The system is designed for easy deployment and integration, with a robust HTTP API interface.
- PDF content extraction from uploaded files and Arxiv URLs
- LLM-powered podcast generation in a 2-host format with natural conversation flow
- OpenAI-compatible LLM and TTS integration
- WAV audio stitching for seamless podcast creation
- Dockerized deployment for easy setup and scaling
- HTTP API interface for programmatic access
- Configuration via environment variables
- Agentic LLM tool-use for information retrieval (e.g. date, web search, etc) for more relevant and up to date information for the podcast
- Updated UI/Frontend with more configurations exposed to control podcast generation
- Memory integration to support retrieval from "previous episodes", developed host personalities, likes/dislikes, etc
POST /podcasts
Initiates podcast generation from PDF files or Arxiv URLs.
Request Parameters:
files
(optional if arxiv_urls exists): List of PDF files to processarxiv_urls
(optional if files exists): List of Arxiv URLs to process
Response:
{
"job_id": "unique-job-identifier",
"status": "processing",
"created_at": "timestamp"
}
GET /podcasts/status/{job_id}
Retrieves the status of a podcast generation job.
Path Parameters:
job_id
: Unique identifier for the job
Response:
{
"job_id": "unique-job-identifier",
"status": "processing/completed/failed",
"progress": 0-100,
"result_file": "filename.mp3" (if completed)
}
GET /podcasts/download/{filename}
Downloads the generated podcast file.
Path Parameters:
filename
: Filename of the podcast to download
The following environment variables can be configured:
ALLOWED_ORIGINS
: Comma-separated list of allowed origins for CORS (default:*
)
HOST_A_NAME
: Name for speaker AHOST_B_NAME
: Name for speaker B
LLM_API_HOST
: URL for the LLM service (default: "http://192.168.1.16:8000")LLM_MODEL
: LLM model to use (default: "Mistral-Small-3.2-FP8")LLM_HOST_TEMPERATURE
: Temperature setting for LLM (default: 0.6)LLM_TIMEOUT
: Timeout for LLM requests in seconds (default: 600)
TTS_API_HOST
: URL for the TTS service (default: "http://192.168.1.16:8000")TTS_API_PATH
: API path (default: "/v1/audio/speech")TTS_MODEL
: TTS model to use (default: "Kyutai-TTS-Server")TTS_TIMEOUT
: Timeout for TTS requests in seconds (default: 60)TTS_WAKEUP_ENDPOINT
: (Optional) API ENDPOINT that will call a GET with 60 second timeout to "wake up the server"
AUDIO_STORAGE_PATH
: Directory to store generated audio files (default: "./audio_storage")MAX_FILE_SIZE
: Maximum allowed file size in bytes (default: 10,485,760 bytes / 10MB)
The application can be deployed using Docker. A docker-compose.yml
file is provided for easy setup.
For local development, install the required dependencies:
pip install -r requirements.txt
Start the application:
uvicorn app.main:app --host 0.0.0.0 --port 8000
This project is licensed under the MIT License.