SpotifyETL

Description

A simple ETL data pipeline which extracts data from Spotify's Web API, specifically data on the latest hits and builds a datalake on an S3 bucket.

This datalake is built on a weekly basis with AWS CloudWatch, which runs an AWS Lambda function every week.

SpotifyETL employs the use of Terraform, an "infrastructure as code" tool to:

define IAM roles and policies
specify the AWS Lambda function to be performed for data extraction
configure the AWS CloudWatch alarm to run the lambda function weekly

How Can This Data be Used?

SpotifyETL extracts information about the latest, trending tracks, specifically from Spotify's playlist 'Viral Hits'. The type of information extracted is as follows:

Column	Remarks
Year of Release	To analyze songs by the time they were released
Song Title	Not needed, but useful for future queries
Artist Name	Name of main artist
Artist Genre	To analyze what song genres go viral
Popularity of artist	From Spotify Docs: The artist's popularity is calculated from the popularity of all the artist's tracks. The popularity score allows us to see how many of these TikTok hits are one-hit wonders, from a well-known artist, etc.

Architecture

Usage

Required packages

spotipy (pip install spotipy)
boto3 (pip install boto3)

Running locally

The Python script for extracting Spotify data can be run locally via the command

python viral_hits.py

Running the following command will generate a .csv file locally containing data on tracks currently in the 'Viral Hits' playlist on Spotify.

Running on the cloud

Based on how Terraform has been configured, the AWS lambda function runs the lambda_handler() method in viral_hits.py and uploads the generated csv to an S3 bucket.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.tf		.tf
config		config
img		img
lambda_payload/viral_hits		lambda_payload/viral_hits
README.md		README.md
payload.zip		payload.zip
viral_hits.csv		viral_hits.csv
viral_hits.py		viral_hits.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpotifyETL

Description

How Can This Data be Used?

Architecture

Usage

Required packages

Running locally

Running on the cloud

About

Releases

Packages

Languages

theeugenechong/SpotifyETL

Folders and files

Latest commit

History

Repository files navigation

SpotifyETL

Description

How Can This Data be Used?

Architecture

Usage

Required packages

Running locally

Running on the cloud

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages