Cache everything #536

jacksongoode · 2024-10-04T16:57:23Z

This issue connects #316, #445, #522 and maybe #348

Now that we're in the process of making iterative improvements, one big one which I feel is probably has the most impact on user experience is caching all of the requests we make. This means caching playlists, first and foremost, but along with saved trucks, saved albums, and other lists. I think this should be our next big feature push. @SO9010

However, we'll need some mechanism of invalidating these caches if the playlist has changed and we need query to get the new state.

Somethings won't change (most likely):

Album covers for albums (not daylists/mixes)
Lyrics
User's list of playlist

We can add a manual shortcut to refresh (or maybe an icon as well) if they need to be queried again.

For the rest of the queries that we make I suggest we:

Have their cache invalidate after 1 hour since refresh
Allow this duration to be configured in the settings (so some users who only want to manually refresh for changes can do so)
Keep track of all changes made from the client side (so there's no desync)

Of particular note, we will have to write some custom async function to allow the user to query all of their saved songs. My suggestion here would be that if a user goes to their saved tracks and nothing has been cached yet, we add a button that allows them to do a full retrieval of their saved songs. This would be a somewhat long async task, but would run in the background and loop over the max number of items we can retrieve from users like to songs with the API. We would need some kind of indication somewhere that these tracks are being retrieved and when they finish being retrieved. I'm not sure if that is a modal or if that is some UI element, but since it's such a long task, it might be worthwhile implementing for users with large libraries.

This will probably require some considerable reworking of most of the requests that we make. But as long as we wrap these requests around a singular function that can handle the storage, retrieval, and validation in a unified way, it should be very nice. I also believe this will give us an opportunity to consider how the cache is stored and whether we can make improvements.

SO9010 · 2024-10-04T17:24:19Z

@jacksongoode This sounds great so far!

Have you considered integrating lazy loading and virtualized scrolling to reduce the number of tracks that need to be loaded and cached? This would improve the program's load time and reduce cache size.

I believe this would require changes to the queue system, but it would allow users to load an endless number of tracks. We could also add a setting for users to define how many pages are fetched at once.

Additionally, we could introduce a priority-based caching system, where larger, long-term caches are used for saved albums and playlists.

For playlists, we could display an older cached version while requesting the latest from Spotify. If changes are detected, we could download the first 50 tracks and progressively cache the rest.

SO9010 · 2024-10-06T11:46:53Z

We could get users to set which playlists they want to have prioritised too.

SO9010 · 2024-10-06T11:53:51Z

So I think the key points that we address should include:

Cache Essentials:

Cache playlists, saved tracks, and albums.
Use manual refresh options (e.g., a button or icon) to update data when necessary.

Cache Expiry and Flexibility:

Set default cache expiry at 1 hour, and then we could have a comparative check to see if anything has changed, but make this duration configurable in the app's settings for users who prefer manual refreshes.

Long Async Queries:

For longer tasks, like retrieving a user's saved songs, run background async tasks and provide a UI indication (e.g., a modal or progress bar) to show when this process is complete. Or point 4.

Lazy Loading and Virtualized Scrolling:

To improve loading speed and minimize cache size, implement lazy loading and virtualized scrolling, allowing users to progressively load large lists of tracks.

Queue System and Progressive Updates:

For playlists, display the cached version first and asynchronously check for updates. If changes are detected, refresh the first 50 tracks and progressively update the rest.

kingosticks · 2024-10-06T12:57:16Z

As someone who's implemented generic caching for Spotify at the request level for a Python project, maybe my experience is helpful. I really regret doing it this way, it was easier and it does work but it would have been better to cache at the item level, (or maybe a hybrid). The way I did it, I've got multiple copies of track data depending on the context e.g. PlaylistTrackObject, SimplifiedTrackObject, TrackObject and then also depending on how it was requested (single item, multiple grouped together etc). Mostly I respect Spotify's cache headers, sometimes I use them as a guide, but then other times i ignore them entirely (I consider images and basic track metadata as static, while playlists must be dynamic).

If/when I redo it, I'll organise it around the items rather than the requests, with essentially a database of Spotify URIs. For what it's worth, I wouldn't be against writing (or helping to write) such a thing in Rust since using Rust from Python is supposedly pretty straightforward these days.

SO9010 · 2024-10-06T15:20:49Z

I done some looking at how the spotify API does it. So what it first does is it loads only the first 25, which is probably done just so that the user can decide whether or not they want to listen to the song or not, then it grabs the rest of the song.

We could perhaps also use the same API call that spotfy uses as we have that essentailly set up already for the home section. (We should probably add a enum for what type of api call we want to use).

The call that it uses initally is:

Base URL: https://api-partner.spotify.com/pathfinder/v1/query?
Operation Name: fetchPlaylist

Variables

uri: spotify:playlist:37i9dQZF1DZ06evO1DHoaY (this is just an example)
offset: 0
limit: 25

Extension

persited query

The next call is:

This is the same as before just
offset: 25
Limit: 100

This would greatyly reduce the amount of time that it takes to request the data due to having a larger return window.
See here to see the json return of the last call

SO9010 · 2024-10-06T15:58:17Z

@kingosticks Thank you! I'm pretty sure that PSST already does what you're suggesting. Currently, there are 4 main directories in the cache folders: Track, Episode, Audio and Key. When the user plays a track from any playlist or album it loads the track using this function:

fn load_track(
    item_id: ItemId,
    session: &SessionService,
    cache: &CacheHandle,
) -> Result<Track, Error> {
    if let Some(cached_track) = cache.get_track(item_id) {
        Ok(cached_track)
    } else {
        let track = Track::fetch(session, item_id)?;
        if let Err(err) = cache.save_track(item_id, &track) {
            log::warn!("failed to save track to cache: {:?}", err);
        }
        Ok(track)
    }
}

This means that there is only one copy of the track data for every playlist and album.
Please correct me if this system is not what you were suggesting :)

SO9010 · 2024-10-06T20:05:31Z

So, unfortunately, I am unable to get the API which Spotify uses to work properly; it just returns nothing no matter what I try!

SO9010 · 2024-10-07T09:04:14Z

Also, we should add a cache to say what the offset the user is at or the scroll location for the user. This will make going back through history way more excellent, particularly notable for the show lyric button.

kingosticks · 2024-10-07T10:10:35Z

This means that there is only one copy of the track data for every playlist and album. Please correct me if this system is not what you were suggesting :)

Yes! I had problems when I initially tried to implement that approach because when I came to load an album, or particularly a long playlist, I could end up with many single track requests and would hit Spotify's rate-limiting. So I had to group track requests together and then ungroup the responses afterwards, which is doable. And then different endpoints return different views of the tracks so I had to check if the cached track data I already had (maybe from their /playlist endpoint) was actually sufficient for what I now needed (maybe more detailed info from their /tracks endpoint). So those multiple views will likely then have multiple expiry times that need taking into account. And when requests fail, do you invalidate all item views, or just the one you were updating? How would I go about evicting old data if I wasn't sure what it was being used for, must I track that also, and to that per view also? For getting playlists-with-tracks, this would sometimes end up being more requests than just getting everything from the playlist endpoint in the first place. It was getting complicated and very Spotify-specific (which at the time I wanted to avoid since I have other APIs to use) so I just tossed it all and went with a far simpler request-based caching approach. And here I am regretting it. I didn't even bother implementing cache eviction in the end, nobody has complained as yet about that....

But it sounds like you've managed to avoid my issues (real or just in my head). But I am curious how you avoid the rate-limiting when requesting just single tracks.

SO9010 · 2024-10-07T20:01:36Z

I'm wondering if you requested all of the song data with the playlist. I'm pretty sure PSST doesn't do this but only loads two of three songs ahead of the queue. This would be where the issue that you are facing comes from. From.

PSST currently invalidates just the song which has failed to load as this is separate from the UI aspect.

Sorry if I didn't understand your questions!

kingosticks · 2024-10-07T20:18:20Z

No, not at all. I just wanted to provide my experience in case it helped form what you were doing. Sorry for the noise.

SO9010 · 2024-10-07T21:41:24Z

Oh! Thank you very much, I'm sorry that I misinterpreted you! :P

SO9010 · 2024-10-11T17:38:43Z

Also relevant is #501

SO9010 · 2024-11-18T10:46:44Z

@jacksongoode we should start moving on this more soon. Any more ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache everything #536

Cache everything #536

jacksongoode commented Oct 4, 2024

SO9010 commented Oct 4, 2024

SO9010 commented Oct 6, 2024

SO9010 commented Oct 6, 2024

kingosticks commented Oct 6, 2024 •

edited

Loading

SO9010 commented Oct 6, 2024

SO9010 commented Oct 6, 2024

SO9010 commented Oct 6, 2024

SO9010 commented Oct 7, 2024

kingosticks commented Oct 7, 2024

SO9010 commented Oct 7, 2024

kingosticks commented Oct 7, 2024

SO9010 commented Oct 7, 2024

SO9010 commented Oct 11, 2024

SO9010 commented Nov 18, 2024

Cache everything #536

Cache everything #536

Comments

jacksongoode commented Oct 4, 2024

SO9010 commented Oct 4, 2024

SO9010 commented Oct 6, 2024

SO9010 commented Oct 6, 2024

kingosticks commented Oct 6, 2024 • edited Loading

SO9010 commented Oct 6, 2024

The call that it uses initally is:

Variables

Extension

The next call is:

SO9010 commented Oct 6, 2024

SO9010 commented Oct 6, 2024

SO9010 commented Oct 7, 2024

kingosticks commented Oct 7, 2024

SO9010 commented Oct 7, 2024

kingosticks commented Oct 7, 2024

SO9010 commented Oct 7, 2024

SO9010 commented Oct 11, 2024

SO9010 commented Nov 18, 2024

kingosticks commented Oct 6, 2024 •

edited

Loading