Skip to content

Data Analysis exercise to practice Data Cleaning, Univariate Analysis, Bivariate Analysis and SQL.

jonathansada/music_popularity_analysis

Repository files navigation

Music Popularity Analysis

Introduction

This is an analysis of music popularity compared with features of the music like key, mode, duration, danceability, energy, etc...
This analysis was done as a Data Analysis exercise with the idea to practice Data Cleaning, Univariate Analysis, Bivariate Analysis and SQL.

Project Overview

I started the project by defining the questions I wanted to answer:

  1. How do song properties affects their popularity?
  2. How popular are acoustic songs compared to the AVG?
  3. How popular are instrumental songs compared to the AVG?
  4. How popular are live songs compared to the AVG?
  5. What are the most popular songs?
  6. What are the 5 most popular songs in Europe
  7. What are the most popular songs songs per continent?
  8. What are the 5 most popular artist?
  9. What are the properties of the songs of the most popular artists?
  10. How does song properties evolve across the time? (properties vs release date)
  11. How does the time impact popularity? (popularity vs release date)
  12. How many albums were released by the most popular artists?
  13. When was the release of the first album of the most popular artist?

All these questions are answered through the analysis done on the Python Notebook music_popularity_analysis.ipynb and the sql queries in the file music_popularity_sql_answers.sql.

A preview of the project is also included at the included presentation Music_Popularity_Analysis.pdf including some answers to the questions raised.

Datasets*

This analysis is based on two datasets:

The main information for the analysis is taken from the Spotify dataset but I wanted to group the information per continent instead of country for that reason I used the second dataset.

* NOTE:
This repository doesn't include any dataset (maily due to github file size limitations) to execute the code and/or to generate the database (not included for the same reason) you will need to manually download the datasets and include them as:

  • ./dataset/universal_top_spotify_songs.csv
  • ./dataset/country-and-continent-codes-list-csv.csv

License

Top Spotify Songs in 73 Countries (Daily Updated) is under license ODC Attribution License (ODC-By)

This work is licensed under a Creative Commons Attribution 4.0 International License.
CC BY 4.0

About

Data Analysis exercise to practice Data Cleaning, Univariate Analysis, Bivariate Analysis and SQL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published