Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile for transcription and Speaker Diarization #909

Closed
kowshik24 opened this issue Oct 29, 2024 · 3 comments
Closed

Dockerfile for transcription and Speaker Diarization #909

kowshik24 opened this issue Oct 29, 2024 · 3 comments

Comments

@kowshik24
Copy link

I faced many issues while building the dockerfile for transcription and Speaker Diarization. Is there any git-repo available for that? Or are you planning to create a docker file specifically for runpod serverless.

@randyburden
Copy link

I use the pre-built Docker images from this repo: https://github.com/jim60105/docker-whisperX

@kowshik24
Copy link
Author

@randyburden thanks for sharing I got the same repo. Do you have the docker hub repo for that?

@randyburden
Copy link

@kowshik24, no, I don't use Docker Hub. I use the code below to pull in the Docker WhisperX image from the GitHub Container Registry, then create a new customized Docker image that preloads and caches the Pyannote models for offline use, and then upload that Docker image to Azure Container Services.

# Define optional arguments that indicate the OpenAI Whisper model size and language to use
ARG WHISPER_MODEL=medium
ARG LANG=en

# Get the base WhisperX Docker image (https://github.com/jim60105/docker-whisperX)
FROM ghcr.io/jim60105/whisperx:${WHISPER_MODEL}-${LANG}

# Define the required argument for the huggingface.co token used by Pyannote (diarization/speaker-recognition library)
ARG HUGGING_FACE_TOKEN

# Output argument value for debugging/inspecting
RUN echo "Huggingface.co token: ${HUGGING_FACE_TOKEN}"

# Ensure the required argument was supplied
# (test -n "") Returns false if the string is zero length
RUN test -n "$HUGGING_FACE_TOKEN" || (echo "HUGGING_FACE_TOKEN argument is required" && false)

# Preload and cache the Pyannote models so that the image can run offline
RUN python3 -c 'from pyannote.audio import Pipeline; pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token="'${HUGGING_FACE_TOKEN}'")'

@Barabazs Barabazs closed this as completed Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants