-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sketch of the library #1
Comments
A general work flow would look something like this:
That's the general idea, there are areas to improve and try different things out. |
Thanks @kujaku11. In terms of workflow - how iterative of a process would this typically be? Do you often iterate on a single step? or do a few steps in time series processing and then perhaps go back and remove a few more outliers? |
Here are a couple thoughts based on the ppt and notes above. There are 3 main classes: Here are a couple thoughts on what a A couple key points about the below:
class TimeSeries(BaseTimeSeries): # top level this might be processing.NSEM.TimeSeries
def __init__(hx=None, hy=None, hz=None, ex=None, ey=None, **kwargs):
self._source_data = utils.create_data_frame(hx, hy, hz, ex, ey) # if large, we can be smart about writing this to disk
def __getitem__(self, key):
return self.current_data[key] # if you call self['ex'] looks at the current data
@property
def source_data(self):
"""
Original data set. remains un-touched.
"""
return self._source_data
@property
def current_data(self):
if getattr(self, '_current_data', None) is None:
self._current_data = self._source_data # if we haven't taken any processing steps yet, this is just the source data
return self._current_data
@property
def processing_log(self):
"""
This returns an ordered dict (or similar - needs to be serializable)
of the processing steps taken, including their args, kwargs
"""
if getattr(self, '_processing_log', None) is None:
self._processing_log = {}
return self._processing_log
def remove_pipeline_noise(self, period):
self._current_data = utils.remove_pipeline_noise(self.current_data, period)
self.register_processing_step('remove_pipeline_noise', {'period': period})
return self.current_data A couple other comments
|
Good outline of the time series data @lheagy. Probably should also add a save_data() function so you can reuse filtered data. The logging is a good idea, what about something like where processing step is the code snippet that produced that step, not sure how you do that but would be handy. You could then have a function that wrote out those steps in a script file. Could also have the return dictionary be its own object with methods of 'write_script_file, get_step, etc' As for iterative processing on the time series, this is usually a one off, or a trial and error. Most of the time if you do a pretty good job of removing the obvious noise in the time domain, the frequency domain processing will take care of the rest. That's where most of the hard work is done. Time series processing is usually minimal, unless you have noise like pipeline noise where there are obvious periodicity structure. |
Also, for keeping track of units: https://pint.readthedocs.io/en/0.7.2/ |
pint and pandas do not mix well. If you want to use pandas and pint, then maybe 'read-in' function should convert everything to SI units. |
Ah, thanks @ahartikainen, I haven't played around with them together before. In this case, I don't think we will be making heavy use of pint, in a lot of ways, I see it being used more as a property on a class so we can do validation (not necessarily deeply integrating by attaching it to values or embedding it in a data frame. Agreed though that getting the data into SI on construction of a data class is a good approach. |
Here are a couple points that came up in the meeting today (see: https://www.youtube.com/watch?v=-X4HbcedBUY): @sgkang and @kujaku11 please correct me where I am off-base! and feel free to add
|
Thanks, @lheagy, @kujaku11 and @ahartikainen for your comments. I want to add a couple of things to add to be considered for the design of the library. Like the direction of having the things object oriented and leverage/use tools in My suggestions on the classes in the module:
A bit of my reasoning for having a single time series as the base makes the library much more versatile. As I understand the procedure, until the calculation of the transfer functions most of the calculations/operations are done individually on each component. So for performance, designing the library in that way should make parallelization easier if dealing with a list of base objects that can be sent out individually for the number crunching. Designing the filters to be As has been discussed, bookkeeping of the procedure (both of what is done but also where the transfers are calculated from in respect to the spectral and time series) is important. Being able to identify where in the time series an outlier is calculated from would be an effective tool to have. Using |
There are 3 main modules we are considering at the moment: Time Series, Spectral and Transfer functions. We will focus in on Natural Source EM data to start. Below I have added notes from the conversation and google doc with @kujaku11, @grosenkj, @rowanc1, @sgkang, @dougoldenburg. In addition, it would be helpful to outline a sketch of pseudo-code: What should a workflow look like?
Time Series
Spectral Analysis
Transfer Functions
The text was updated successfully, but these errors were encountered: