Semantic search on podcast transcripts

This project's origin is here. In this project, we will be using Weaviate to perform semantic search on podcast transcripts. We will be using the OpenAI text2vec transformer module to vectorize the text. Once the complete data is vectorized and stored, we will be able to perform semantic search on the data.

Vectorization module: sentence-transformers/multi-qa-distilbert-cos-v1. Note: if this doesn't work, try sentence-transformers/msmarco-distilroberta-base-v2

(TODO: Add demo video)

Prerequisites

Before you can run the project, you need to have Docker, Docker Compose, and Python installed on your machine. Follow the instructions below to install the prerequisites:

1. Install Docker:

For Windows and Mac:
- Download and install Docker Desktop from Docker's official website.

For Linux:

Run the following commands in your terminal:

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

2. Install Docker Compose:

For Windows and Mac:
- Docker Compose is included with Docker Desktop.
For Linux:
- Run the following command in your terminal:
```
sudo apt install docker-compose
```

3. Install Python:

Download and install the latest version of Python from Python's official website.
Verify the installation by running the following command in your terminal:
```
python --version
```

Setup instructions

Install virtualenv (if not already installed):
```
pip install virtualenv
```
Create a Virtual Environment: Navigate to the directory where you want to create your virtual environment, then run:
```
virtualenv <name_of_virtualenv>
```

Activate the Virtual Environment: On Windows, run:

.\<name_of_virtualenv>\Scripts\activate

On macOS and Linux, run:

source <name_of_virtualenv>/bin/activate

Install Python requirements:
```
pip install -r requirements.txt
```

Export OpenAI API Key:

export OPENAI_APIKEY=<your_openai_api_key>

Usage instructions

Start up Weaviate: docker-compose up -d. Once completed, Weaviate is running on http://localhost:8080.
Run python import.py to import the transcripts into Weaviate.
The data is now stored in the Weaviate instance. You can experiment with it using a python notebook or a python file.

Dataset license

300 Podcast transcripts from Changelog

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
README.md		README.md
docker-compose.yml		docker-compose.yml
helper.py		helper.py
import.py		import.py
requirements.txt		requirements.txt
schema.json		schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic search on podcast transcripts

Prerequisites

1. Install Docker:

2. Install Docker Compose:

3. Install Python:

Setup instructions

Usage instructions

Dataset license

About

Releases

Packages

Contributors 4

Languages

weaviate-tutorials/DEMO-semantic-search-podcast

Folders and files

Latest commit

History

Repository files navigation

Semantic search on podcast transcripts

Prerequisites

1. Install Docker:

2. Install Docker Compose:

3. Install Python:

Setup instructions

Usage instructions

Dataset license

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages