-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile for transcription and Speaker Diarization #909
Comments
I use the pre-built Docker images from this repo: https://github.com/jim60105/docker-whisperX |
@randyburden thanks for sharing I got the same repo. Do you have the docker hub repo for that? |
@kowshik24, no, I don't use Docker Hub. I use the code below to pull in the Docker WhisperX image from the GitHub Container Registry, then create a new customized Docker image that preloads and caches the Pyannote models for offline use, and then upload that Docker image to Azure Container Services. # Define optional arguments that indicate the OpenAI Whisper model size and language to use
ARG WHISPER_MODEL=medium
ARG LANG=en
# Get the base WhisperX Docker image (https://github.com/jim60105/docker-whisperX)
FROM ghcr.io/jim60105/whisperx:${WHISPER_MODEL}-${LANG}
# Define the required argument for the huggingface.co token used by Pyannote (diarization/speaker-recognition library)
ARG HUGGING_FACE_TOKEN
# Output argument value for debugging/inspecting
RUN echo "Huggingface.co token: ${HUGGING_FACE_TOKEN}"
# Ensure the required argument was supplied
# (test -n "") Returns false if the string is zero length
RUN test -n "$HUGGING_FACE_TOKEN" || (echo "HUGGING_FACE_TOKEN argument is required" && false)
# Preload and cache the Pyannote models so that the image can run offline
RUN python3 -c 'from pyannote.audio import Pipeline; pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token="'${HUGGING_FACE_TOKEN}'")' |
I faced many issues while building the dockerfile for transcription and Speaker Diarization. Is there any git-repo available for that? Or are you planning to create a docker file specifically for runpod serverless.
The text was updated successfully, but these errors were encountered: