This application provides the services needed to perform content based music recomendation using features extracted from 30s track previews available via the Spotify Web API.
- Crawls Spotify playlists, gets the 30s preview URL, downloads audio to S3.
- Uses the Spotify web API.
- Uses Keunwoo Choi's CNN for feature extraction.
- Triggered by event from audio file uploaded to S3.
- Pulls audio file down, extracts features, stores them in database.
- Deletes the audio when done.
- For storage and retrieval of unprocessed features extracted from audio files.
- The
AnnoyIndex
item attribute is computed as a uuid1, bit shifted 114 bits to the right. This ensures that the python int maps within the C 32 bit length limit, whilst remaining unique, as the 14 rightmost bits are generated from the time that the uuid is generated. See the documentation for more details.
- Provide a GET endpoint to service recommendations following query by example.
- Uses ANNOY, to service queries.
- Takes a single parameter, a spotify track ID.
- Subscribes to events that notify when the ANNOY space has been updated.
This service is the custodian of the ANNOY space. It is responsible for:
- Initialising the annoy space and storing it in S3.
- Updating the annoy space following writes to the database.
- Possibly implemented by subscribing to a DynamoDB event stream.
- Publishing events to let subscribers know when the annoy space has been updated.
- Possibly implemented by an event triggered via upload to the S3 bucket.
- An S3 bucket to temporarily store audio from which to extract features.
- An AWS managed database instance in which to store extracted features.
-
The services use
.env
files for configuration, which must be created and stored in theconfig/
directory. In this directory you will also find templates to create the necessary.env
files. -
To build the project, run:
make build-all
-
To run the playlist crawler, run:
make crawl
-
To run the feature extractor, run:
make extract