-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time_base='union' method #171
Conversation
… numpy 2.0 NaN error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs discussion. I think your version is pretty idiosyncratic, and if we are to go this route, I think it should be more general. Thanks!
pyglider/ncprocess.py
Outdated
@@ -67,10 +67,10 @@ def extract_timeseries_profiles(inname, outdir, deploymentyaml): | |||
dss['v'] = dss.water_velocity_northward.mean() | |||
dss['v'].attrs = profile_meta['v'] | |||
elif 'u' in profile_meta: | |||
dss['u'] = profile_meta['u'].get('_FillValue', np.NaN) | |||
dss['u'] = profile_meta['u'].get('_FillValue', np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert all these - NaN
is a perfectly acceptable alias for nan
, and the is clutters up this PR.
pyglider/slocum.py
Outdated
@@ -807,9 +811,35 @@ def binary_to_timeseries(indir, cachedir, outdir, deploymentyaml, *, | |||
outdir : string | |||
Directory to put the merged timeseries files. | |||
|
|||
deploymentyaml : str | |||
deploymentyaml : string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deploymentyaml : string | |
deploymentyaml : str |
The python type is str
pyglider/slocum.py
Outdated
fnamesuffix : string | ||
Suffix for the output timeseries file | ||
|
||
time_base : string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_base : string | |
time_base : str, default 'sci_water_temp' |
pyglider/slocum.py
Outdated
If this value is 'union', then the processing is handled differently, | ||
to allow for 'unioning' the engineering and science timeseries. This | ||
may be useful if for instance you want a full time series, and science | ||
variables are only sampled on dives. | ||
|
||
For a value of 'union', the dbdreader MultiDBD.get() method is used | ||
rather than get_sync to read the parameters specified in | ||
deploymentyaml. The argument return_nans (of MultiDBD.get()) is set to | ||
True, so that there are two 'time bases' for the extracted data: one | ||
for engineering variables (from m_present_time), and one for science | ||
variables (from sci_m_present_time). These times are rounded to the | ||
nearest second, and then merged. These values are the time index of | ||
the output file. In this case, only the engineering variables (e.g., | ||
lat/lon, pitch, roll, m_depth) are interpolated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this value is 'union', then the processing is handled differently, | |
to allow for 'unioning' the engineering and science timeseries. This | |
may be useful if for instance you want a full time series, and science | |
variables are only sampled on dives. | |
For a value of 'union', the dbdreader MultiDBD.get() method is used | |
rather than get_sync to read the parameters specified in | |
deploymentyaml. The argument return_nans (of MultiDBD.get()) is set to | |
True, so that there are two 'time bases' for the extracted data: one | |
for engineering variables (from m_present_time), and one for science | |
variables (from sci_m_present_time). These times are rounded to the | |
nearest second, and then merged. These values are the time index of | |
the output file. In this case, only the engineering variables (e.g., | |
lat/lon, pitch, roll, m_depth) are interpolated. | |
If this value is 'union', then the processing is handled differently, | |
to allow for 'unioning' the engineering and science timeseries. This | |
may be useful if for instance you want a full time series, and science | |
variables are only sampled on dives. | |
For a value of 'union', the dbdreader MultiDBD.get() method is used | |
rather than get_sync to read the parameters specified in | |
deploymentyaml. The argument return_nans (of MultiDBD.get()) is set to | |
True, so that there are two 'time bases' for the extracted data: one | |
for engineering variables (from m_present_time), and one for science | |
variables (from sci_m_present_time). These times are rounded to the | |
nearest second, and then merged. These values are the time index of | |
the output file. In this case, only the engineering variables (e.g., | |
lat/lon, pitch, roll, m_depth) are interpolated. |
This isn't super clear, and not what I understood was going to happen here. If there are two science sensors on different time bases, I thought both times were going to be logged and NaN inserted for the sensors on a different time base.
Not clear on the "rounding to nearest second part". Surely there are plenty of sensors sampled faster than 1 Hz that this will be a bad assumption for. I'm not clear why you are doing this.
Why would the engineering sensors be treated differently than a mismatched science sensor? This seems a mishmash of the two approaches, and you could imagine someone wanting the raw engineering data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be updated now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Docs updated, and hopefully reasonably simplified
pyglider/slocum.py
Outdated
_log.debug(f'sensors: {[i for i in sensors]}') | ||
|
||
time_base_union = time_base == 'union' | ||
if time_base_union: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the "normal" loop first, and the time_base_union loop second. It makes the code easier to read if the default predeces the non-default.
Co-authored-by: Jody Klymak <[email protected]>
Thank you for the comments! I added a commit reverting to NaN (although see comment above), putting the default method first, and starting doc updates. Other comments/questions follow |
I expected that too. In testing though, the
This I think is the key question for this method being general enough. Yep, there are plenty of sensors that sampled faster than 1Hz. Because index of the
Very fair point, I'll update it. I did this based on our needs/interests, and agree it's inconsistent. Although, should the latitude and longitude values still be interpolated in your opinion? |
Hi @jklymak, I added a commit such that no engineering values (including lat/lon) are interpolated. I also wanted to check in about your thoughts on the above? |
OK< let's fix this step by step - I think times should definitely not be rounded, so that is a bug. I'll fix now. |
See #177 for better handling of the time. |
merge pyglider updates into pull request
Hey @jklymak Sorry for the extra ping, but when you have a chance are there any other adjustments that I can make to this pr at the moment? Or is it close to something you're comfortable with? |
OK, my apologies - I guess I still think that this is too orthogonal to the main timeseries method. If you really want to go this way, can we just make a new method I understand the potential appeal of such a data set, but actually strongly feel you will end up in the end with something like what |
Thanks for your explanation and time with this! Totally respected wrt your feelings on this being too specific of a want for pyglider.
I don't follow how Also, a question regarding a wrapper that eg makes a few |
Basically - if there is a lot of commonality, then we could think about refactoring. You have looked at it more carefully than I have though - if you really think it would be shorter to have the if/else loops as you've done here we should continue the discussion. I'm just looking at it from an end-user point of view that these are some different methods.
Yeah, we are discussing factoring out the profile creator. If needed we can quickly have a "skip profiles" flag that would work as well. |
Ahh thank you - it just finally clicked for me what 'raw' and 'timeseries' mean in pyglider world, so I now appreciate the importance of specifying 'raw' for such a method.
There would be some duplication . However, after really looking at it I don't think it would be unreasonable as long as this chunk was factored out, ie was its own function that could be called by both If this all feels better to you, I'd be happy to take a stab at it? And if so, would you prefer that I update this pr or make a new one? |
No need to change this quickly imo. To clarify, is part of the refactoring considering adding an argument to |
Belatedly closing this pr, as we've moved forward with creating multiple (engineering and science) NetCDF files. Thanks for all of the thoughtful discussion. |
first pass at a time_base='union' method for binary_to_timeseries
also changed several np.NaN to np.nan, per error from numpy 2.0