Audio waveform cache-warming Django command #529

AetherUnbound · 2022-02-22T22:25:19Z

Description

Related to WordPress/openverse-catalog#510

We now have a table where audio waveforms are cached, but this table is not yet fully populated. We have plans to move this processing upstream into the catalog (WordPress/openverse#731).

For the time being, we should create a Django command that can be run with python manage.py <command> which will iterate through all audio records and generate waveforms for them. Here is an outline of the necessary code:

audios = Audio.objects.all()
with tqdm(total=audios.count()) as progress:
    for audio in audios:
        audio.get_or_create_waveform()
        progress.update(1)

The above code example uses tqdm, which I don't believe is a dependency we have right now. We could add it for this use and remove it after, since the progress bar and estimated time to completion are very useful.

We have ~14k records, and a test run on staging showed that this could take anywhere from 12-48 hours.

Alternatives

Additional context

Implementation

🙋 I would be interested in implementing this feature.

The text was updated successfully, but these errors were encountered:

AetherUnbound · 2022-02-23T00:14:17Z

It might be good to order the audio objects by id, so that the results are reproducible on multiple runs.

zackkrida · 2022-02-23T00:40:14Z

12-48 hours

That's quite a long time, and makes me wonder if handling this during the data refresh or in the catalog is advisable. Assuming the number of audio records never increased from the current numbers, this would still dramatically increase the data refresh time, right?

AetherUnbound · 2022-02-23T00:54:24Z

It would, seeing as audio currently takes about 5 minutes. However, we would only need to go through the initial records once; after the waveform for those have been generated, they can be skipped and only new audio records will require waveform generation.

AetherUnbound added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 🤖 aspect: dx Concerns developers' experience with the codebase labels Feb 22, 2022

AetherUnbound mentioned this issue Feb 23, 2022

Django command for generating waveforms #530

Merged

7 tasks

sarayourfriend closed this as completed in #530 Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio waveform cache-warming Django command #529

Audio waveform cache-warming Django command #529

AetherUnbound commented Feb 22, 2022

AetherUnbound commented Feb 23, 2022

zackkrida commented Feb 23, 2022 •

edited

Loading

AetherUnbound commented Feb 23, 2022

Audio waveform cache-warming Django command #529

Audio waveform cache-warming Django command #529

Comments

AetherUnbound commented Feb 22, 2022

Description

Alternatives

Additional context

Implementation

AetherUnbound commented Feb 23, 2022

zackkrida commented Feb 23, 2022 • edited Loading

AetherUnbound commented Feb 23, 2022

zackkrida commented Feb 23, 2022 •

edited

Loading