Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Audio waveform cache-warming Django command #529

Closed
1 task
AetherUnbound opened this issue Feb 22, 2022 · 3 comments · Fixed by #530
Closed
1 task

Audio waveform cache-warming Django command #529

AetherUnbound opened this issue Feb 22, 2022 · 3 comments · Fixed by #530
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon

Comments

@AetherUnbound
Copy link
Contributor

Description

Related to WordPress/openverse-catalog#510

We now have a table where audio waveforms are cached, but this table is not yet fully populated. We have plans to move this processing upstream into the catalog (WordPress/openverse#731).

For the time being, we should create a Django command that can be run with python manage.py <command> which will iterate through all audio records and generate waveforms for them. Here is an outline of the necessary code:

audios = Audio.objects.all()
with tqdm(total=audios.count()) as progress:
    for audio in audios:
        audio.get_or_create_waveform()
        progress.update(1)

The above code example uses tqdm, which I don't believe is a dependency we have right now. We could add it for this use and remove it after, since the progress bar and estimated time to completion are very useful.

We have ~14k records, and a test run on staging showed that this could take anywhere from 12-48 hours.
image (3)

Alternatives

Additional context

Implementation

  • 🙋 I would be interested in implementing this feature.
@AetherUnbound AetherUnbound added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 🤖 aspect: dx Concerns developers' experience with the codebase labels Feb 22, 2022
@AetherUnbound
Copy link
Contributor Author

It might be good to order the audio objects by id, so that the results are reproducible on multiple runs.

@zackkrida
Copy link
Member

zackkrida commented Feb 23, 2022

12-48 hours

That's quite a long time, and makes me wonder if handling this during the data refresh or in the catalog is advisable. Assuming the number of audio records never increased from the current numbers, this would still dramatically increase the data refresh time, right?

@AetherUnbound
Copy link
Contributor Author

It would, seeing as audio currently takes about 5 minutes. However, we would only need to go through the initial records once; after the waveform for those have been generated, they can be skipped and only new audio records will require waveform generation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants