-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace sox with in-memory processing #76
Comments
We should start discussing how to design this a bit. One thought I had was to use the PySox API still as it's nice and modular but replace the operations within it with numpy, soundfile, and pyrubberband functionalities. This could be something we contribute to pysox, even, meaning Scaper wouldn't have to change at all? Another option is just to write another top-level file (or expand what is already in audio.py) that would do these operations. I like the Sox API though, because the operations are tacked on to a list first and then processed all at once via One other thing that will need to change is the loudness calculation in |
We might want to keep #38 in mind when doing this. But that might also greatly hinder solving this issue as it would require a big API change. But just something to think about! |
I just found this project: https://github.com/kkroening/ffmpeg-python/, which could go a long way to expanding Scaper capability. However, the processing would still be in tempfiles but there are two advantages:
We would have to do some work around that library to expose ffmpeg audio features a bit more easily, though. The library has some level of support for this: https://github.com/kkroening/ffmpeg-python#custom-filters. This would also require specialized installations of ffmpeg to work well, with all the extra bells and whistles (e.g. rubberband). Finally, offline, we discussed doing most of the stuff in tempfiles but doing the combining and padding of the isolated sound events in memory instead. This would be much faster and would solve issues like #68. I'll experiment a bit with that library to see how well it works and report back. One more thing we need to do is isolate all of the tempfile logic in Scaper into one place, rather than having it in several places. This could be done by creating another class (EventAudio?, SingleEvent?) that wraps all of the disk I/O logic we need. Then Scaper could interact with this class instead of interacting with the tempfiles directly. This abstraction should pass all the existing test cases first to be sure it works. Then to implement the above ffmpeg changes (or other proposed changes), we could change the implementation of the class in one spot, rather than having to make changes all over the Scaper codebase. |
Cheers @pseeth ! One quick comment - I don't think we should implement anything as a class unless it explicitly requires to hold state. We can probably abstract away I/O logic with simple functional wrappers, right? |
I'm thinking what could require state is all of the augmentations required to do on the event. Currently we use the pysox Combiner/Transformer classes to keep track of all this and then feed the tempfile to those classes. I was thinking we would need to wrap those so that the sox operations all happen in one part of the code. I'll have to hack around a bit to see if it's necessary, though. |
Sounds like something that could be represented in a simple list, no? Or some simple data structure (e.g. OrderedDict). But yeah I guess it's worth looking into. Perhaps it would be helpful to first flesh out the full list of audio operations that Scaper currently performs and the order in which they happen, to map everything out explicitly. |
There seem to be several disadvantages to using sox for audio processing:
The above seems like sufficient motivation to consider a redesign where all audio processing is happening in memory instead of via sox. This would be a major overhaul though, so would require significant cycles.
The text was updated successfully, but these errors were encountered: