This project demonstrates an unsupervised machine learning approach to building a book recommendation system. The system uses frequent pattern mining algorithms to analyze and identify relationships in the dataset and generate recommendations. A web interface is provided for users to input a book and receive recommendations based on it.
- Overview
- Dataset Description
- Approach and Algorithms
- Project Workflow
- Results
- Technologies Used
- Future Work
This project is a books recommendation system that suggests books to users based on patterns derived from a dataset of user ratings. It leverages frequent pattern mining algorithms, specifically Eclat, Apriori, and FP-Growth, to uncover associations between books.
A web interface allows users to input a book and receive recommendations generated by the FP-Growth model. The recommendations are based on the most frequent patterns discovered in the dataset.
The dataset used in this project consists of two files:
- Books: Contains metadata about books, including:
- ISBN
- Book title
- Author
- Year of publication
- Publisher
- Image URL (Small, mediam, large)
- Ratings: Contains user interaction data:
- User ID
- ISBN of the book
- Book rating (0–10 scale)
After cleaning the data:
- The final dataset includes the following columns:
- User ID
- ISBN
- Book rating
- Book title
The recommendation system explores frequent patterns in user ratings using:
- Eclat Algorithm
- Apriori Algorithm
- FP-Growth Algorithm
The FP-Growth Algorithm produced the best results, as it efficiently handled large datasets and provided meaningful book association rules.
-
Data Cleaning:
- Removed duplicate and missing entries.
- Merged
Books
andRatings
datasets into a single CSV file. - Ensured consistent formatting.
-
Frequent Pattern Mining:
- Tested Eclat, Apriori, and FP-Growth on the dataset.
- Evaluated each algorithm based on computational efficiency and meaningful recommendations.
-
Evaluation:
- Selected the FP-Growth Algorithm for its superior performance.
-
Web Interface:
- Developed a simple web interface where users can input a book title.
- The system returns book recommendations based on the input.
- Frequent itemsets: Discovered frequent combinations of books rated by users.
- Recommendations: Generated book suggestions based on association rules derived using the FP-Growth Algorithm. Web Interface: Functional interface to provide recommendations interactively.
The following technologies were used to develop the project:
- Python: The core programming language for data processing and algorithm implementation.
- Pandas: For data cleaning, preprocessing, and manipulation.
- Mlxtend: Library providing implementation of the FP-Growth algorithm and association rules.
- Flask: Web framework used to create the user interface for inputting books and displaying recommendations.
- HTML/CSS: To design and style the web interface.
- Jupyter Notebook: Used during the exploratory data analysis (EDA) phase for visualizing and understanding the data.
Here are the potential future improvements and extensions to the project:
-
Transition to Streamlit:
- Replace Flask with Streamlit to provide a more interactive and visually appealing user interface.
- Integrate Streamlit’s features to improve user experience and interactivity.
-
MLOps Integration:
- Develop the project as an MLOps pipeline.
- Automate data ingestion, model training, deployment, and monitoring.