Crawl Spotify and Genius for all the songs of your favourite artist!
This program was built using Python 3.10
.
Older versions may work, but are currently not supported.
So far I have only tested this on OSX. I plan on testing this on Windows in the future, but it may currently not work.
You will also need to setup access to the Spotify and Genius APIs. More on that below.
This program is a tool for data scientists and NLP enthusiasts who would like to gather data on music. It gathers two types of data: Spotify audio features and lyrics.
Spotify audio features are metrics about songs which were generated by Spotify and are accessible through their API. As the name implies, the features are audio related and include some interesting measurements, such as dancability or acousticness. A full list of features the API provides can be found here.
The lyrics are gathered through the Genius API.
pip install -r requirements.txt
This repository makes use of Spotipy to access the Spotify API. If you're interested in the statistics gathered from Spotify, you might want to checkout the Spotipy Documentation.
To query data from the API, you first need to create an app as Spotify developer. You need to do this in order to get a client_id
and client_secret
.
You need to export the credentials as environment variables SPOTIFY_CLIENT_ID
and SPOTIFY_CLIENT_SECRET
respectively.
Here is a little tutorial on how to get the credentials: https://cran.r-project.org/web/packages/spotidy/vignettes/Connecting-with-the-Spotify-API.html
Lyrics are gathered using LyricsGenius to access the Genius.com API.
The first step to access the API is to create a Genius account. Once you have the account, you can use it to generate a client access token. You need to export this as the environment variable GENIUS_ACCESS_TOKEN
.
After setting up both the Spotify and Genius API you are ready to go.
There are two main ways to use this repo:
- As a command-line interface (CLI)
- As a Python module
The songcrawler project was made with the intention of creating a CLI. It was then written in a way that should make it usable as a Python module, but that functionality is secondary.
The CLI
functionality is provided in the main.py
. You can use it as follows:
python3 main.py query
query
is a variable, which could take the following forms:
- A Spotify URI (e.g.
spotify:track:2Ud3deeqLAG988pfW0Kwcl
) - A Genius ID (e.g.
8150537
) - A freetext query
There are many additional parameters and flags to adjust the program's behaviour. To list them all you can use:
python3 main.py --help
Spotify URIs are the main way of using this program. They are quite flexible, as they represent not only songs, but also entire albums, playlists or even artists.
You can access the URIs through the share menu, which, for songs you find by right-clicking on them, and for all other resources through the three dot button at the top. Under the share option, the URIs are currently hidden. In order to reveal them, you need to use the option
key on mac or the ctrl
key on Windows.
Credit: MattSuda in the Spotify Community
With the Spotify URI, you can use the CLI like this:
python3 main.py spotify:album:1R8kkopLT4IAxzMMkjic6X
Sometimes artists have different songs with the same name (looking at you, The 1975...). Other times, the Genius API may just have difficulties finding the correct song. In this case you can query the lyrics directly using the Genius ID.
This works the same way as with Spotify URIs:
python3 main.py 8150537
It is only supported for single songs. There is no easy way to gather the Genius ID as of now, see issue #29.
My personal favourite feature is the freetext search. It allows for the use of keywords to help with your search.
python3 main.py "artist:LCD Soundsystem album:This is Happening"
Accepted keywords are artist:
, album:
, playlist:
and track:
. Songcrawler will always look for the most specific keyword given. In the query above it will request an album. If no keywords are given, it will search for a track.
As requests can be complex and it's not always certain, which result is the best, the search mode is interactive. Songcrawler will return a table of 15 results; you can select the correct one by typing it's index. Alternatively you can gather the next set of results using '+'/'-'
or (p)revious/(n)ext
.
I'm happy about any feedback, contribution, etc.