Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace sox with in-memory processing #76

Closed
justinsalamon opened this issue Feb 15, 2020 · 7 comments
Closed

Replace sox with in-memory processing #76

justinsalamon opened this issue Feb 15, 2020 · 7 comments

Comments

@justinsalamon
Copy link
Owner

justinsalamon commented Feb 15, 2020

There seem to be several disadvantages to using sox for audio processing:

The above seems like sufficient motivation to consider a redesign where all audio processing is happening in memory instead of via sox. This would be a major overhaul though, so would require significant cycles.

@pseeth
Copy link
Collaborator

pseeth commented Feb 19, 2020

We should start discussing how to design this a bit. One thought I had was to use the PySox API still as it's nice and modular but replace the operations within it with numpy, soundfile, and pyrubberband functionalities. This could be something we contribute to pysox, even, meaning Scaper wouldn't have to change at all?

Another option is just to write another top-level file (or expand what is already in audio.py) that would do these operations. I like the Sox API though, because the operations are tacked on to a list first and then processed all at once via build. I think staging the operations and then performing them in one swoop is a design decision that works well for Scaper. At least for development, it makes the actual changes to audio happen "all in one spot", which makes it easy to debug PRs like #55. So if we go this route, then I think we should discuss keeping a similar Transformer, Combiner type API.

One other thing that will need to change is the loudness calculation in scaper.audio.r128stats, which requires FFMPEG and a subprocess call. It requires the path to the audio file as input, not a numpy array. That's a bit tricky.

@pseeth
Copy link
Collaborator

pseeth commented Feb 19, 2020

We might want to keep #38 in mind when doing this. But that might also greatly hinder solving this issue as it would require a big API change. But just something to think about!

@pseeth
Copy link
Collaborator

pseeth commented Apr 16, 2020

I just found this project: https://github.com/kkroening/ffmpeg-python/, which could go a long way to expanding Scaper capability. However, the processing would still be in tempfiles but there are two advantages:

  1. We get a lot more augmentation capability by using ffmpeg's awesome suite of built-in tools. I don't know if anyone has the bandwidth to rewrite all of the DSP out there in Python efficiently.
  2. We get Custom audio filters / transformations (e.g. "plugins") #38 almost for free with the way that library is written. ffmpeg signal graphs are already a DAG which the library makes much more explicit.

We would have to do some work around that library to expose ffmpeg audio features a bit more easily, though. The library has some level of support for this: https://github.com/kkroening/ffmpeg-python#custom-filters. This would also require specialized installations of ffmpeg to work well, with all the extra bells and whistles (e.g. rubberband).

Finally, offline, we discussed doing most of the stuff in tempfiles but doing the combining and padding of the isolated sound events in memory instead. This would be much faster and would solve issues like #68.

I'll experiment a bit with that library to see how well it works and report back.

One more thing we need to do is isolate all of the tempfile logic in Scaper into one place, rather than having it in several places. This could be done by creating another class (EventAudio?, SingleEvent?) that wraps all of the disk I/O logic we need. Then Scaper could interact with this class instead of interacting with the tempfiles directly. This abstraction should pass all the existing test cases first to be sure it works. Then to implement the above ffmpeg changes (or other proposed changes), we could change the implementation of the class in one spot, rather than having to make changes all over the Scaper codebase.

@justinsalamon
Copy link
Owner Author

Cheers @pseeth !

One quick comment - I don't think we should implement anything as a class unless it explicitly requires to hold state. We can probably abstract away I/O logic with simple functional wrappers, right?

@pseeth
Copy link
Collaborator

pseeth commented Apr 16, 2020

I'm thinking what could require state is all of the augmentations required to do on the event. Currently we use the pysox Combiner/Transformer classes to keep track of all this and then feed the tempfile to those classes. I was thinking we would need to wrap those so that the sox operations all happen in one part of the code. I'll have to hack around a bit to see if it's necessary, though.

@justinsalamon
Copy link
Owner Author

Sounds like something that could be represented in a simple list, no? Or some simple data structure (e.g. OrderedDict). But yeah I guess it's worth looking into.

Perhaps it would be helpful to first flesh out the full list of audio operations that Scaper currently performs and the order in which they happen, to map everything out explicitly.

@pseeth
Copy link
Collaborator

pseeth commented Sep 24, 2020

Addressed by #116, #117, #118, #130, #128, #126.

@pseeth pseeth closed this as completed Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants