Book Summary Embeddings and Nearest Neighbors Analysis

This repository contains Python scripts for generating embeddings from book summaries using the voyageai API and performing nearest neighbors analysis on the generated embeddings. The project aims to semantically analyze book summaries to find similarities and recommend books based on content.

Prerequisites

Before you begin, ensure you have met the following requirements:

Python 3.6+
pandas
numpy
scikit-learn
nltk
A valid API key from voyageai

Configuration

Obtain an API key from voyageai. You will need to sign up for an account and subscribe to a plan that suits your needs.
Once you have your API key, open embeddings.py and locate the following line:
```
client = voyageai.Client(api_key="")
```

Replace the empty string with your API key:

client = voyageai.Client(api_key="YOUR_API_KEY_HERE")

Files Description

embeddings.py: This script processes a dataset of book summaries to generate embeddings using the voyageai API. It includes data cleaning, token counting, and embedding generation.
semantic_scores.py: After generating embeddings, this script loads them and uses the k-Nearest Neighbors algorithm to find and analyze the closest summaries based on their semantic similarity.

Running the Scripts

Place your dataset in the same directory as the scripts or update the file paths in the scripts to where your dataset is located.
Run embeddings.py first to generate embeddings for your dataset. This will create a new CSV file with the embeddings included.
```
python embeddings.py
```
After generating the embeddings, run semantic_scores.py to perform the nearest neighbors analysis.
```
python semantic_scores.py
```

Note on Test Coverage

Please note that the scripts folder does not have test coverage because all these scripts are intended for one-time use. They were specifically designed to process a dataset for a singular analysis purpose, and as such, traditional unit or integration testing paradigms are not directly applicable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Book Summary Embeddings and Nearest Neighbors Analysis

Prerequisites

Configuration

Files Description

Running the Scripts

Note on Test Coverage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Book Summary Embeddings and Nearest Neighbors Analysis

Prerequisites

Configuration

Files Description

Running the Scripts

Note on Test Coverage