Skip to content

Latest commit

 

History

History
152 lines (111 loc) · 6 KB

README.md

File metadata and controls

152 lines (111 loc) · 6 KB

The bookworm project.

build_test coverage

Jacob Peterson, Lawrie Brunswick, Priyam Gupta, Sue Boyd

Project Type

Tool

Table of Contents

Introduction

With millions of books able to be read, it can be daunting to find the perfect book. Book recommendation tools aim to provide a one stop shop for your next read. Our book recommendation tool, "The Bookish Butterfly" employs a multi-modal approach to offer users a personalized approach. Unlike some recommender systems that only rely on ratings or genres, our model integrates multiple search modalities to provide better recommendations. We provide many options to the user depending on what they are looking for and do the heavy lifting to get some books that will be a great next read. There's no advertising influence here!

Questions of Interest

MVP:

  • What book should I read next?
  • What other books can I read from the same author?
  • Which book would be a good read related to my current book?
  • What books have similar plots to a book I liked?
  • What are the popular or trending books in a particular genre?

Repository Structure

Find it here

Data Sources

Book Ratings

Book Crossing Dataset Includes:

  1. BX-Book-Ratings.csv
  2. BX-Books.csv
    • 271379 unique values
    • Fields: ISBN, Book-Title, Book-Author, Year-Of-Publication, Publisher, Image-URL-S, Image-URL-M, Image-URL-Lnot
    • Due to file size, this file was not included in the repo, but can be obtained from the link above.

Plot Summaries

Kaggle CMU Book Summary

  1. BookSummaries.txt
    • 16,559 values
    • Fields: Wikipedia article ID, Freebase ID, Book Title, Author, Publication Date, Book Genres, Plot Summary
    • The data from BookSummaries.txt was extracted into the file data_raw/complete_data.csv

ISBN Matching

Google Books API

  1. Google Books API
    • ISBN (13 digit)
    • Book Title
    • This API was used to augment CMU data with ISBN Numbers to help for matching with Book Ratings dataset
    • ISBN numbers obtained via Google APIs also included in data_raw/complete_data.csv

Cleaning and Processing

A description of data cleaning, joining and preprocessing can be found Here and Here. A descripton of the final datasets used in production and testing can be found Here.

Data Limitations

This project is a proof of concept, executed on a small dataset (~13K books total after data cleaning), with some data sparsity even within those books. As such, some searches may return no or limited results. We'd love to see the work extended to a larger dataset! When a user tries to search based on a book or an author that is not in our dataset, we let them know and encourage them to search another way.

Local Setup and Environment

Local Setup

This repository can be cloned onto your local computer by running the following command in a terminal:

git clone https://github.com/jacobp24/bookworm_rec.git

If git is not already downloaded, use the Git Guide and then clone the repository.

Environment

For this repository we have set up a environment that can be ran locally and install Python dependencies with appropriate version requirements. Conda needs to be installed before running the next commands. Refer to Conda Installation for further instructions.

Make sure your current directory is set the 'bookworm_rec' folder. If it is not please run this code:

cd bookworm_rec

Now run the next command to create the bookworm_env Conda environment:

conda env create -f env.yml

Make sure to activate the newly created environment:

conda activate bookworm_env

Once done with the environment (after using the tool), deactivate it by running:

conda deactivate

Application

Our application runs with the Streamlit Python library. Before jumping onto the webpage, you will need to do the following steps:

In order to generate the recommendation embeddings we utilized the VoyageAI package.

Please create a local API KEY by following these steps:

  1. Make sure your current directory is set to the 'bookworm' folder within 'bookworm_rec'. If it is not please run this from within the bookworm_rec directory:
cd bookworm
  1. Click Here to create your own API KEY.

  2. Copy your new API key and run this command:

export API_KEY="replace-with-your-api-key"

This command is space specific i.e. there cannot be spaces before and after the equals. Make sure your new API KEY is in double quotes!

  1. To check that the API KEY was created successfully:
echo $API_KEY
  1. Okay now we are ready to run the application!
streamlit run app.py

Go check out our application in your local browser!!!

Examples

Here is a video demonstration of our app!

OR

A walkthrough of application can be found in the examples folder