Implementation of a movie recommendation pipeline using MovieLens data. Inspiration came from this article on codementor and the Spark documentation on collaborative filtering.
This repo is built on the arc data pipeline framework.
Download and unpack data files by running make download
Start the infrastructure by running docker-compose up
. This will start the Arc Jupyter Notebook service. You can then open Jupyter at http://localhost:8888.
With the Docker infrastructure running, the first step is to run src/MovieModel.ipynb
to set up, train, and store the ALS machine learning model for the pipeline to use.
The, open src/MovieRecommendations.ipynb
to execute the ML model as part of a data pipeline
Files generated by running the Arc IDE can be cleaned up through running make clean
.