This is a data documentation of all Papadosio albums and live recordings uploaded to their Bandcamp.
- Psipolygons seems like a variation of polygons. I'll keep them separate for now?
- curve vs curvature?
This project uses Python 3.12.8
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip freeze > requirements.txt
black .
Bandcamp is a little weird about reliably loading pages the same way every time. So follow these steps to update data:
- Open Chrome and navigate to https://papadosio.bandcamp.com/music
- Scroll to the bottom, ensuring all content is loaded on a single page
- File -> Save Page As... -> Webpage, HTML Only
- Save the file to the
raw/
directory in this local repository - Use default naming,
Music _ Papadosio.html
- Overwrite if necessary, to perform an incremental update
- Save the file to the
This script scans raw/Music _ Papadosio.html
for album links that need to be downloaded. It creates/overwrites data/albums_to_download.json
.
python app/1_parse_albums.py
This script creates data/downloaded_albums.json
if it does not exist, or updates it with new albums found in data/albums_to_download.json
.
python app/2_download_albums.py
This script scans raw/albums/
for html files, and parses the tracks and metadata into equivalent json files in data/albums
. It filters out a lot of unnecessary data, and renames several keys. If the json files that correspond to the html files already exist, they will not be touched.
python app/3_process_albums.py
The files in data/albums
can be deleted before running this script if you like. It takes less than 30 seconds to recreate them with this script.
This script scans data/albums/
for json files, and combines them into a single file in data/downloaded_albums.json
. The downloaded_albums.json
file is recreated every time this script is run.
python app/4_create_albums.py
This script pulls each track name out of data/downloaded_albums.py
, and makes sure it is accounted for in data/track_searchlist.json
.
python app/5_generate_track_searchlist.py
The track_searchlist.json
file contains every track name (lowercased and trimmed). Some track names have alternate spellings, dates, etc., so the goal of this file is to have the dictionary's keys represent the track names a user would search for. The values give the alternate spellings and number of total tracks represented by these names.