Skip to content

Latest commit

 

History

History
103 lines (78 loc) · 3.95 KB

README.md

File metadata and controls

103 lines (78 loc) · 3.95 KB

Books Recommendation System

This project demonstrates an unsupervised machine learning approach to building a book recommendation system. The system uses frequent pattern mining algorithms to analyze and identify relationships in the dataset and generate recommendations. A web interface is provided for users to input a book and receive recommendations based on it.

Table of Contents

  1. Overview
  2. Dataset Description
  3. Approach and Algorithms
  4. Project Workflow
  5. Results
  6. Technologies Used
  7. Future Work

Overview

This project is a books recommendation system that suggests books to users based on patterns derived from a dataset of user ratings. It leverages frequent pattern mining algorithms, specifically Eclat, Apriori, and FP-Growth, to uncover associations between books.

A web interface allows users to input a book and receive recommendations generated by the FP-Growth model. The recommendations are based on the most frequent patterns discovered in the dataset.


Dataset Description

The dataset used in this project consists of two files:

  1. Books: Contains metadata about books, including:
    • ISBN
    • Book title
    • Author
    • Year of publication
    • Publisher
    • Image URL (Small, mediam, large)
  2. Ratings: Contains user interaction data:
    • User ID
    • ISBN of the book
    • Book rating (0–10 scale)

After cleaning the data:

  • The final dataset includes the following columns:
    • User ID
    • ISBN
    • Book rating
    • Book title

Approach and Algorithms

The recommendation system explores frequent patterns in user ratings using:

  1. Eclat Algorithm
  2. Apriori Algorithm
  3. FP-Growth Algorithm

The FP-Growth Algorithm produced the best results, as it efficiently handled large datasets and provided meaningful book association rules.


Project Workflow

  1. Data Cleaning:

    • Removed duplicate and missing entries.
    • Merged Books and Ratings datasets into a single CSV file.
    • Ensured consistent formatting.
  2. Frequent Pattern Mining:

    • Tested Eclat, Apriori, and FP-Growth on the dataset.
    • Evaluated each algorithm based on computational efficiency and meaningful recommendations.
  3. Evaluation:

    • Selected the FP-Growth Algorithm for its superior performance.
  4. Web Interface:

    • Developed a simple web interface where users can input a book title.
    • The system returns book recommendations based on the input.

Results

  • Frequent itemsets: Discovered frequent combinations of books rated by users.
  • Recommendations: Generated book suggestions based on association rules derived using the FP-Growth Algorithm. Web Interface: Functional interface to provide recommendations interactively.

Technologies Used

The following technologies were used to develop the project:

  • Python: The core programming language for data processing and algorithm implementation.
  • Pandas: For data cleaning, preprocessing, and manipulation.
  • Mlxtend: Library providing implementation of the FP-Growth algorithm and association rules.
  • Flask: Web framework used to create the user interface for inputting books and displaying recommendations.
  • HTML/CSS: To design and style the web interface.
  • Jupyter Notebook: Used during the exploratory data analysis (EDA) phase for visualizing and understanding the data.

Future Work

Here are the potential future improvements and extensions to the project:

  1. Transition to Streamlit:

    • Replace Flask with Streamlit to provide a more interactive and visually appealing user interface.
    • Integrate Streamlit’s features to improve user experience and interactivity.
  2. MLOps Integration:

    • Develop the project as an MLOps pipeline.
    • Automate data ingestion, model training, deployment, and monitoring.