-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support processing of text files #176
base: main
Are you sure you want to change the base?
Conversation
@ChristianGeng, @maxschmitt this pull request has the first ideas for adding support for JSON and TXT files to My general idea is:
I will only be able to continue working on this from 2024/08/15. If you are interested in this, feel free to create another pull request with a solution. |
Looks great, so far, and makes all sense to me. (Won't have much time to work on it either during August.) |
Afaics all test stubs are there, with all test geared towards text processing failing. I would be motivated to get my hands on it in the calendar week starting Aug. 5, |
Yes, no worries, just take a look if you have time. It might be that the tests are not complete yet, it's all work in progress. It also looks to me, that some of the tests should also be re-written in a less imperative way, e.g. maybe using a class. |
The structure of the code itself inside |
Thanks for the hint! I had often been trying to circumvent the pencil and paper bit by using software to create call graphs automaticall: but I am seing that the good old manual stuff is probably still the best: my experience with the python packages for creating call graphs is worse than mixed and has not become better over the years. neither pyan, pycallgraph, pycallgraph2 were satisfactory. So I will go for the classical one too. |
I have gone down the pathway that you recommended and created call graphs of the Process class. For example, Signals: Text data: The crucial problem that causes the tests in The most forward approach that I could come up with was to define a second functional called Pointing to such a functon can probably be deferred until However, whether dealing with signals or non-signal data has to be known as early as I have created a tentative merge request for now that does
While it should be able to implennt this similarly for e.g.
sampling_rate: int = None,
resample: bool = False,
channels: typing.Union[int, typing.Sequence[int]] = None,
mixdown: bool = False,
win_dur: Timestamp = None,
hop_dur: Timestamp = None,
min_signal_dur: Timestamp = None,
max_signal_dur: Timestamp = None, So rather than determining whether text or signal "by hand", would it not be better to design the class to have separate the two strands initially, and have separate processing modes a priori (possibly use simple inheritance?) To me it looks like you were already unhappy, and have created a stub of a public interface called see #179 |
Thanks for proposing a fix for the failing tests. It is indeed unfortunate that we need to track inside the class if we process data or signals.
There are indeed several differences between a sampled signal and other "data":
From a developer standpoint, all those points seem to indicate that we should indeed try to separate the implementation more and not add everything to the single On the other hand, my main motivation so far was coming from a user perspective. For a user it is very convenient if the following code works independent of the underlying data/signals: interface = audinterface.Process()
interface.process_index(index) In summary, I still don't know what is the best solution for the desired data processing. |
There are probably several ways to get both under the same umbrella without breaking the api. One way would be to go into the metaprogramming direction and have the `Process` class delegate the objec t creation to signal and text specific classes via `__new__`. Untested, but would something like this do it?: class ProcessData(Process):
def __init__(
self,
*,
process_func: typing.Callable[..., typing.Any] = None,
process_func_args: typing.Dict[str, typing.Any] = None, # etc
)
class ProcessSignals(Process):
def __init__(
self,
*,
process_func: typing.Callable[..., typing.Any] = None,
process_func_args: typing.Dict[str, typing.Any] = None, # etc
sampling_rate: int = None,
resample: bool = False,
channels: typing.Union[int, typing.Sequence[int]] = None,
mixdown: bool = False,
)
class Process(object):
def __new__(cls, **kwargs): # only have kwargs
# possibly need to pop it from kwargs
processing_mode = kwargs.get("processing_mode")
if processing_mode == "signal":
return _ProcessSignals(kwargs)
if processing_mode == "text":
return _ProcessData(kwargs)
def common_methods(self, **args):
print("define method common methods to both text and data") |
I have assembled an example call graph using pyan usind If one were to separate the text and signal strands more then it probably could be made more balanced. Then the signal portion could possibly go down a path like this:
One could possibly rename _process_signal into _process and in that strand (hence the parenthesis)
Suffixes (again parenthesized) could be stripped in order to "balance" the tree. |
Sounds good to me. Feel free to branch off from here and implement your changes (or continue from #179 if this makes more sense). |
Closes #173
...
TODO: implement a
read_func
argument, which can be used to provide a function to read in the file from disk. Maybe we don't need the extra_call_data()
andprocess_data()
methods then?