Replace sox with in-memory processing #76

justinsalamon · 2020-02-15T02:35:13Z

There seem to be several disadvantages to using sox for audio processing:

Requires a lot of I/O via tempfiles
Can create off-by-1 issues (cf. Sources are off by 1 sample. #73 Sources don't always sum to mix, issue with clipping? #74)
Poor pitch shifting and time stretching quality
Unwanted warnings (cf. Unexpected warnings with dumping sources alongside mix #75)
Clunky in terms of using Scaper as part of a pipeline (cf On-the-fly (in-memory) soundscape generation #3)

The above seems like sufficient motivation to consider a redesign where all audio processing is happening in memory instead of via sox. This would be a major overhaul though, so would require significant cycles.

pseeth · 2020-02-19T06:59:22Z

We should start discussing how to design this a bit. One thought I had was to use the PySox API still as it's nice and modular but replace the operations within it with numpy, soundfile, and pyrubberband functionalities. This could be something we contribute to pysox, even, meaning Scaper wouldn't have to change at all?

Another option is just to write another top-level file (or expand what is already in audio.py) that would do these operations. I like the Sox API though, because the operations are tacked on to a list first and then processed all at once via build. I think staging the operations and then performing them in one swoop is a design decision that works well for Scaper. At least for development, it makes the actual changes to audio happen "all in one spot", which makes it easy to debug PRs like #55. So if we go this route, then I think we should discuss keeping a similar Transformer, Combiner type API.

One other thing that will need to change is the loudness calculation in scaper.audio.r128stats, which requires FFMPEG and a subprocess call. It requires the path to the audio file as input, not a numpy array. That's a bit tricky.

pseeth · 2020-02-19T07:18:39Z

We might want to keep #38 in mind when doing this. But that might also greatly hinder solving this issue as it would require a big API change. But just something to think about!

pseeth · 2020-04-16T20:44:36Z

I just found this project: https://github.com/kkroening/ffmpeg-python/, which could go a long way to expanding Scaper capability. However, the processing would still be in tempfiles but there are two advantages:

We get a lot more augmentation capability by using ffmpeg's awesome suite of built-in tools. I don't know if anyone has the bandwidth to rewrite all of the DSP out there in Python efficiently.
We get Custom audio filters / transformations (e.g. "plugins") #38 almost for free with the way that library is written. ffmpeg signal graphs are already a DAG which the library makes much more explicit.

We would have to do some work around that library to expose ffmpeg audio features a bit more easily, though. The library has some level of support for this: https://github.com/kkroening/ffmpeg-python#custom-filters. This would also require specialized installations of ffmpeg to work well, with all the extra bells and whistles (e.g. rubberband).

Finally, offline, we discussed doing most of the stuff in tempfiles but doing the combining and padding of the isolated sound events in memory instead. This would be much faster and would solve issues like #68.

I'll experiment a bit with that library to see how well it works and report back.

One more thing we need to do is isolate all of the tempfile logic in Scaper into one place, rather than having it in several places. This could be done by creating another class (EventAudio?, SingleEvent?) that wraps all of the disk I/O logic we need. Then Scaper could interact with this class instead of interacting with the tempfiles directly. This abstraction should pass all the existing test cases first to be sure it works. Then to implement the above ffmpeg changes (or other proposed changes), we could change the implementation of the class in one spot, rather than having to make changes all over the Scaper codebase.

justinsalamon · 2020-04-16T21:14:59Z

Cheers @pseeth !

One quick comment - I don't think we should implement anything as a class unless it explicitly requires to hold state. We can probably abstract away I/O logic with simple functional wrappers, right?

pseeth · 2020-04-16T21:34:30Z

I'm thinking what could require state is all of the augmentations required to do on the event. Currently we use the pysox Combiner/Transformer classes to keep track of all this and then feed the tempfile to those classes. I was thinking we would need to wrap those so that the sox operations all happen in one part of the code. I'll have to hack around a bit to see if it's necessary, though.

justinsalamon · 2020-04-24T19:37:10Z

Sounds like something that could be represented in a simple list, no? Or some simple data structure (e.g. OrderedDict). But yeah I guess it's worth looking into.

Perhaps it would be helpful to first flesh out the full list of audio operations that Scaper currently performs and the order in which they happen, to map everything out explicitly.

pseeth · 2020-09-24T01:45:45Z

Addressed by #116, #117, #118, #130, #128, #126.

justinsalamon mentioned this issue Feb 15, 2020

Sources are off by 1 sample. #73

Closed

pseeth mentioned this issue Feb 19, 2020

Sources don't always sum to mix, issue with clipping? #74

Open

justinsalamon mentioned this issue May 1, 2020

Better pitch shifting and time stretching #98

Closed

pseeth mentioned this issue May 28, 2020

Speeding up Scaper with in-memory operations #105

Closed

8 tasks

pseeth closed this as completed Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace sox with in-memory processing #76

Replace sox with in-memory processing #76

justinsalamon commented Feb 15, 2020 •

edited

Loading

pseeth commented Feb 19, 2020 •

edited

Loading

pseeth commented Feb 19, 2020 •

edited

Loading

pseeth commented Apr 16, 2020 •

edited

Loading

justinsalamon commented Apr 16, 2020

pseeth commented Apr 16, 2020

justinsalamon commented Apr 24, 2020

pseeth commented Sep 24, 2020 •

edited

Loading

Replace sox with in-memory processing #76

Replace sox with in-memory processing #76

Comments

justinsalamon commented Feb 15, 2020 • edited Loading

pseeth commented Feb 19, 2020 • edited Loading

pseeth commented Feb 19, 2020 • edited Loading

pseeth commented Apr 16, 2020 • edited Loading

justinsalamon commented Apr 16, 2020

pseeth commented Apr 16, 2020

justinsalamon commented Apr 24, 2020

pseeth commented Sep 24, 2020 • edited Loading

justinsalamon commented Feb 15, 2020 •

edited

Loading

pseeth commented Feb 19, 2020 •

edited

Loading

pseeth commented Feb 19, 2020 •

edited

Loading

pseeth commented Apr 16, 2020 •

edited

Loading

pseeth commented Sep 24, 2020 •

edited

Loading