Project for the University Master Degree Course of Algorithms for Massive Datasets.
This project aims to perform a Market Basket Analysis from scratch using the Apriori algorithm. By analyzing customer transaction data from the Yelp dataset, the goal is to identify frequent itemsets and derive association rules that can provide valuable insights into purchasing behaviors.
kaggle
pyspark
nltk
pyspark.mllib
To install the required libraries, run:
pip install kaggle pyspark nltk
Market_Basket_Analysis_Project.ipynb
: Contains the Jupyter notebooks for data exploration and model training.MassiveDatasets_Report.pdf
: Contains the Report with the description of the results