Real-Time Music Recommendation using Pyspark and Kafka

✯ Introduction

Suggesting different songs or musical artists to a user is important to many music streaming services, such as Pandora and Spotify.
Users tend to prefer platforms with good recommendations of their taste in music.
In this project, we have created a recommender system that will recommend new musical artists to a user based on their listening history.
In addition, this type of recommender system could also be used as a means of suggesting TV shows or movies to a user (e.g., Netflix).

✯ Motivation

In 2021, recorded streaming revenue alone exceeded $16.9 billion—with Spotify leading the way.
Spotify paid music rights holders more money than ever in 2021: $7+ billion
The world-class recommendations and availability of an enormous number of songs made music streaming platforms a go-to option.

✯ Methodology

✯ Kafka Streaming

Kafka acts as a producer.
It streams a message as a batch or a single data item (we stream single data item per message).

✯ PySpark with Kafka Integration

Start the Kafka streamer (acting as producer) - pass the dataset path there.

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.0 producer.py

Kafka topic needs to be created before passing it to PySpark API: (using the command below)

./bin/kafka-topics.sh localhost:2181 -topic demo -create

✯ Dataset Description

✯ Spotify Dataset 1921 - 2020, 600+ tracks

Metadata:

Tracks covered: 1921-2020, 600k+ Tracks Artists covered: 1M+ Source: Spotify Web API Creator: Yamac Eren Ay Dataset Type: CSV

Data Description:

Total columns: 20 (example: id, name, popularity, duration_ms, artists, id_artists, energy, release_date, danceability, etc.) Dataset Size: 508.84 MB

✯ Recommendation System

* In this project we have used ALS Implicit Collaborative Filtering for recommendations. * Alternating Least Squares(ALS) is an iterative optimization process where for every iteration we try to arrive closer and closer to a factorized representation of our original data. * We used the best Model with the highest score for making recommendations.

✯ Novel Contributions

Created Real-world streaming simulation using Kafka.
Used state-of-the-art Collaborative Filtering approach for Recommendations.
Integrated model with Spotify for a personalized recommendation.
Deployed the model to a web app for better user experience.

✯ Results Demonstration

✯ References

PySpark: https://spark.apache.org/docs/latest/api/python/
Kaggle Dataset: https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks
Kafka Streaming: https://docs.confluent.io/platform/current/streams/index.html#:~:text=Kafka%20Streams%20is%20a%20client,Kafka's%20server%2Dside%20cluster%20technology.
Streamlit: https://streamlit.io/
Spotipy API: https://github.com/plamere/spotipy
KMeans Classification for Recommendation: https://www.sciencedirect.com/science/article/pii/S1875389212006220

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
LICENSE		LICENSE
README.md		README.md
Spotify_Recommendation_System.ipynb		Spotify_Recommendation_System.ipynb
app.py		app.py
recommended_songs.csv		recommended_songs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Music Recommendation using Pyspark and Kafka

✯ Introduction

✯ Motivation

✯ Methodology

✯ Kafka Streaming

✯ PySpark with Kafka Integration

✯ Dataset Description

✯ Spotify Dataset 1921 - 2020, 600+ tracks

✯ Recommendation System

✯ Novel Contributions

✯ Results Demonstration

✯ References

About

Releases

Packages

Languages

License

RITIK-12/MusicRecommendation

Folders and files

Latest commit

History

Repository files navigation

Real-Time Music Recommendation using Pyspark and Kafka

✯ Introduction

✯ Motivation

✯ Methodology

✯ Kafka Streaming

✯ PySpark with Kafka Integration

✯ Dataset Description

✯ Spotify Dataset 1921 - 2020, 600+ tracks

✯ Recommendation System

✯ Novel Contributions

✯ Results Demonstration

✯ References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages