Skip to content

weaviate-tutorials/DEMO-semantic-search-podcast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic search on podcast transcripts

This project's origin is here. In this project, we will be using Weaviate to perform semantic search on podcast transcripts. We will be using the OpenAI text2vec transformer module to vectorize the text. Once the complete data is vectorized and stored, we will be able to perform semantic search on the data.

Vectorization module: sentence-transformers/multi-qa-distilbert-cos-v1. Note: if this doesn't work, try sentence-transformers/msmarco-distilroberta-base-v2

(TODO: Add demo video)

Prerequisites

Before you can run the project, you need to have Docker, Docker Compose, and Python installed on your machine. Follow the instructions below to install the prerequisites:

1. Install Docker:

  • For Windows and Mac:
  • For Linux:
    • Run the following commands in your terminal:
      sudo apt-get update
      sudo apt-get install docker-ce docker-ce-cli containerd.io

2. Install Docker Compose:

  • For Windows and Mac:
    • Docker Compose is included with Docker Desktop.
  • For Linux:
    • Run the following command in your terminal:
      sudo apt install docker-compose

3. Install Python:

  • Download and install the latest version of Python from Python's official website.
  • Verify the installation by running the following command in your terminal:
    python --version

Setup instructions

  1. Install virtualenv (if not already installed):

    pip install virtualenv
  2. Create a Virtual Environment: Navigate to the directory where you want to create your virtual environment, then run:

    virtualenv <name_of_virtualenv>
  3. Activate the Virtual Environment: On Windows, run:

    .\<name_of_virtualenv>\Scripts\activate

    On macOS and Linux, run:

    source <name_of_virtualenv>/bin/activate
  4. Install Python requirements:

    pip install -r requirements.txt
  5. Export OpenAI API Key:

    export OPENAI_APIKEY=<your_openai_api_key>

Usage instructions

  1. Start up Weaviate: docker-compose up -d. Once completed, Weaviate is running on http://localhost:8080.
  2. Run python import.py to import the transcripts into Weaviate.
  3. The data is now stored in the Weaviate instance. You can experiment with it using a python notebook or a python file.

Dataset license

300 Podcast transcripts from Changelog

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages