Processing Real Time data from spotify using kafka and data bricks.

This project gets data from spotify API ingest into a kafka broker. Spark streaming can be connected to this broker and process it. This project can be setup on your local computer or even deployed to azure.

Directory Structure


    ├── README.md
    ├── kafka
    │   ├── cities.csv
    │   ├── config.yml
    │   ├── kafka_utils.py
    │   ├── producer.py
    │   ├── spotify_utils.py
    │   └── user_utils.py
    ├── requirements.txt
    ├── scripts
    │   ├── install_docker.sh
    │   └── setup_kafka.sh
    ├── spark_streaming
    │   ├── README.md
    │   ├── images
    │   │   ├── databricks.png
    │   │   └── notebook.png
    │   ├── process_stream.py
    │   ├── schema.py
    │   └── spark streaming notebook.ipynb
    └── terraform
        ├── README.md
        ├── data.tf
        ├── images
        │   └── resources.png
        ├── main.tf
        ├── modules
        │   └── general_vm
        │       ├── main.tf
        │       ├── outputs.tf
        │       ├── providers.tf
        │       ├── run_kafka.sh
        │       └── variables.tf
        ├── outputs.tf
        ├── providers.tf
        ├── terraform.tfstate
        ├── terraform.tfstate.backup
        ├── terraform.tfvars
        ├── variables.tf
        ├── vnet.tf
        └── workspace.tf

Spotify

You need to provide your credentials in the .env file.

Go to https://developer.spotify.com/. Sign in with your account.

Now create a new app as follows

After filling in all the options click on submit

Now go to settings click on app settings Note down your client id and client secret as follows:

Note No need to fill in the .env file with your secret. I have already done. When the kafka broker VM gets deployed it uses these credentials. Future scope would be to use credentails given by user when they are used in azure VM.

Kafka

The Realtime data from spotify gets ingested into Kakfa. Look at the kafka directory to know more about code.

Local

For running it locally use

Install kafka.
python -m pip install -r requirements.txt
cd kafka
Create a topic mentioned in the .env file.
Now run python producer.py

Azure

For running it on azure follow this README

Spark Streaming

We can connect to kafka cluster and process the data. One such example is

Find out which song is most popular among Indian males?

We can either run locally or on azure.

Locally

For that you must ensure kafka is running. Follow the kafka section for that.
Install pyspark.
cd spark_streaming
Run python process_stream.py

Azure

Follow this README to know more.

Future Work.

Include CI / CD with GitHub actions.
Handle spotify credentials better.
Try more complex queries.
Try delta tables.
Add vizualization like showing popular songs in every city of India and display it on a map.
Store query results in Azure blob storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Processing Real Time data from spotify using kafka and data bricks.

Directory Structure

Spotify

Kafka

Local

Azure

Spark Streaming

Locally

Azure

Future Work.

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
kafka		kafka
scripts		scripts
spark_streaming		spark_streaming
terraform		terraform
.env		.env
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

deepanshu-yadav/stream_processing_project_s23

Folders and files

Latest commit

History

Repository files navigation

Processing Real Time data from spotify using kafka and data bricks.

Directory Structure

Spotify

Kafka

Local

Azure

Spark Streaming

Locally

Azure

Future Work.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages