Skip to content

charlesa101/youtube-whisper-sbert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Kubeflow Tribal Knowledge

Kubeflow Tribal Knowledge is an open source machine learning model that provides answers to questions about Kubeflow. The answers are provided from training a Natural Language Processing (NLP) model on the transcriptions of Kubeflow community meeting recordings. The goal is to supplement the Kubeflow documentation with an easy way to find current information on features that are in discussion, development and/or have limited documentation.

The model is based on the implementation defined in this post,https://www.pinecone.io/learn/openai-whisper/, with some modifications for Kubeflow. The Kubeflow content is provided by transcribing recordings in the Kubeflow Community Youtube Channel.

The document proposes an implementation using these components:

  1. Pytube to download MP3s from Youtube channel
  2. Whisper for transcoding into 30 second segments
  3. Sentence BERT (SBERT) for sentence transformer and transcript embeddings
  4. Pinecone for storing 30 second segments with timestamps, and youtube channel info
  5. Gradio for User Interface (question input, answer output)

And the workflow is shown in the diagram below (which is from the post referenced above):

Screen Shot 2023-01-05 at 11 49 53 AM

The Kubeflow team is working to simplify this workflow, which includes integrating the dependencies so that they will easily run in Kubeflow.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published