time_base='union' method #171

smwoodman · 2024-06-23T04:05:26Z

first pass at a time_base='union' method for binary_to_timeseries
also changed several np.NaN to np.nan, per error from numpy 2.0

… numpy 2.0 NaN error

jklymak

I think this still needs discussion. I think your version is pretty idiosyncratic, and if we are to go this route, I think it should be more general. Thanks!

jklymak · 2024-06-24T16:14:07Z

pyglider/ncprocess.py

@@ -67,10 +67,10 @@ def extract_timeseries_profiles(inname, outdir, deploymentyaml):
                    dss['v'] = dss.water_velocity_northward.mean()
                    dss['v'].attrs = profile_meta['v']
                elif 'u' in profile_meta:
-                    dss['u'] = profile_meta['u'].get('_FillValue', np.NaN)
+                    dss['u'] = profile_meta['u'].get('_FillValue', np.nan)


Can we revert all these - NaN is a perfectly acceptable alias for nan, and the is clutters up this PR.

jklymak · 2024-06-24T16:15:44Z

pyglider/slocum.py

@@ -807,9 +811,35 @@ def binary_to_timeseries(indir, cachedir, outdir, deploymentyaml, *,
    outdir : string
        Directory to put the merged timeseries files.

-    deploymentyaml : str
+    deploymentyaml : string


Suggested change

deploymentyaml : string

deploymentyaml : str

The python type is str

jklymak · 2024-06-24T16:16:43Z

pyglider/slocum.py

+    fnamesuffix : string
+        Suffix for the output timeseries file
+
+    time_base : string


Suggested change

time_base : string

time_base : str, default 'sci_water_temp'

pyglider/slocum.py

jklymak · 2024-06-24T16:21:46Z

pyglider/slocum.py

+        If this value is 'union', then the processing is handled differently,
+        to allow for 'unioning' the engineering and science timeseries. This
+        may be useful if for instance you want a full time series, and science
+        variables are only sampled on dives.
+
+        For a value of 'union', the dbdreader MultiDBD.get() method is used
+        rather than get_sync to read the parameters specified in
+        deploymentyaml. The argument return_nans (of MultiDBD.get()) is set to
+        True, so that there are two 'time bases' for the extracted data: one
+        for engineering variables (from m_present_time), and one for science
+        variables (from sci_m_present_time). These times are rounded to the
+        nearest second, and then merged. These values are the time index of
+        the output file. In this case, only the engineering variables (e.g.,
+        lat/lon, pitch, roll, m_depth) are interpolated.


Suggested change

If this value is 'union', then the processing is handled differently,

to allow for 'unioning' the engineering and science timeseries. This

may be useful if for instance you want a full time series, and science

variables are only sampled on dives.

For a value of 'union', the dbdreader MultiDBD.get() method is used

rather than get_sync to read the parameters specified in

deploymentyaml. The argument return_nans (of MultiDBD.get()) is set to

True, so that there are two 'time bases' for the extracted data: one

for engineering variables (from m_present_time), and one for science

variables (from sci_m_present_time). These times are rounded to the

nearest second, and then merged. These values are the time index of

the output file. In this case, only the engineering variables (e.g.,

lat/lon, pitch, roll, m_depth) are interpolated.

If this value is 'union', then the processing is handled differently,

to allow for 'unioning' the engineering and science timeseries. This

may be useful if for instance you want a full time series, and science

variables are only sampled on dives.

For a value of 'union', the dbdreader MultiDBD.get() method is used

rather than get_sync to read the parameters specified in

deploymentyaml. The argument return_nans (of MultiDBD.get()) is set to

True, so that there are two 'time bases' for the extracted data: one

for engineering variables (from m_present_time), and one for science

variables (from sci_m_present_time). These times are rounded to the

nearest second, and then merged. These values are the time index of

the output file. In this case, only the engineering variables (e.g.,

lat/lon, pitch, roll, m_depth) are interpolated.

This isn't super clear, and not what I understood was going to happen here. If there are two science sensors on different time bases, I thought both times were going to be logged and NaN inserted for the sensors on a different time base.

Not clear on the "rounding to nearest second part". Surely there are plenty of sensors sampled faster than 1 Hz that this will be a bad assumption for. I'm not clear why you are doing this.

Why would the engineering sensors be treated differently than a mismatched science sensor? This seems a mishmash of the two approaches, and you could imagine someone wanting the raw engineering data.

I think this needs to be updated now?

Thanks. Docs updated, and hopefully reasonably simplified

jklymak · 2024-06-24T16:22:34Z

pyglider/slocum.py

+    _log.debug(f'sensors: {[i for i in sensors]}')
+
+    time_base_union = time_base == 'union'
+    if time_base_union:  


Please put the "normal" loop first, and the time_base_union loop second. It makes the code easier to read if the default predeces the non-default.

…c edits

Co-authored-by: Jody Klymak <[email protected]>

smwoodman · 2024-06-25T15:46:44Z

Thank you for the comments!

I added a commit reverting to NaN (although see comment above), putting the default method first, and starting doc updates. Other comments/questions follow

smwoodman · 2024-06-25T15:58:55Z

This isn't super clear, and not what I understood was going to happen here. If there are two science sensors on different time bases, I thought both times were going to be logged and NaN inserted for the sensors on a different time base.

I expected that too. In testing though, the dbdreader MultiDBD.get() behavior seemed to be to put all sci variables on sci_m_present_time. I.e., all science variables had the same number of values and returned the same timestamps. I think this is because of here?

Not clear on the "rounding to nearest second part". Surely there are plenty of sensors sampled faster than 1 Hz that this will be a bad assumption for. I'm not clear why you are doing this.

This I think is the key question for this method being general enough. Yep, there are plenty of sensors that sampled faster than 1Hz. Because index of the binary_to_timeseries() output is rounded to/returned as seconds) rather than eg ns, I felt that the merging should happen using the same time units. Otherwise, the merging would happen at the ns scale, but then the final index would be at the second scale and so there would be lots of timestamps with the same value.

Why would the engineering sensors be treated differently than a mismatched science sensor? This seems a mishmash of the two approaches, and you could imagine someone wanting the raw engineering data.

Very fair point, I'll update it. I did this based on our needs/interests, and agree it's inconsistent. Although, should the latitude and longitude values still be interpolated in your opinion?

smwoodman · 2024-07-16T23:11:25Z

Hi @jklymak, I added a commit such that no engineering values (including lat/lon) are interpolated.

I also wanted to check in about your thoughts on the above?

jklymak · 2024-07-17T00:36:25Z

OK< let's fix this step by step - I think times should definitely not be rounded, so that is a bug. I'll fix now.

jklymak · 2024-07-17T01:49:28Z

See #177 for better handling of the time.

merge pyglider updates into pull request

smwoodman · 2024-07-17T16:42:06Z

Changes from #177 (and #173, #175, and #176) merged into this pr, and union of sci/eng times now happens at the int level

smwoodman · 2024-08-09T20:40:52Z

Hey @jklymak Sorry for the extra ping, but when you have a chance are there any other adjustments that I can make to this pr at the moment? Or is it close to something you're comfortable with?

jklymak · 2024-08-10T16:14:20Z

OK, my apologies - I guess I still think that this is too orthogonal to the main timeseries method. If you really want to go this way, can we just make a new method binary_to_raw_timeseries, name up to workshopping?

I understand the potential appeal of such a data set, but actually strongly feel you will end up in the end with something like what binary_to_timeseries gives you in the end. If I was really concerned about lining everything up with the CTD, I'd make separate netcdf files for each sensor rather than a single file full of NaNs, and binary_to_timeseries can already do this.

smwoodman · 2024-08-13T01:43:36Z

Thanks for your explanation and time with this! Totally respected wrt your feelings on this being too specific of a want for pyglider.

If you really want to go this way, can we just make a new method binary_to_raw_timeseries, name up to workshopping?

I don't follow how binary_to_raw_timeseries would be structured - it would be a new method with basically the same code as binary_to_timeseries, except for using dbd.get(*dbd.parameterNames, return_nans=True).?

Also, a question regarding a wrapper that eg makes a few binary_to_timeseries calls and merges relevant sensor values and metadata. Since utils.get_profile_new requires 'pressure' and can't identify profiles using 'm_depth', how do you recommend getting profile info for eg an engineering-only netcdf file, or a science netcdf file where science is only sampled on dives? Obvs let me know if this should be a separate issue/discussion.

jklymak · 2024-08-13T03:45:47Z

I don't follow how binary_to_raw_timeseries would be structured - it would be a new method with basically the same code as binary_to_timeseries, except for using dbd.get(*dbd.parameterNames, return_nans=True).?

Basically - if there is a lot of commonality, then we could think about refactoring. You have looked at it more carefully than I have though - if you really think it would be shorter to have the if/else loops as you've done here we should continue the discussion. I'm just looking at it from an end-user point of view that these are some different methods.

Since utils.get_profile_new requires 'pressure' and can't identify profiles using 'm_depth', how do you recommend getting profile info for eg an engineering-only netcdf file,

Yeah, we are discussing factoring out the profile creator. If needed we can quickly have a "skip profiles" flag that would work as well.

smwoodman · 2024-08-13T14:46:46Z

I'm just looking at it from an end-user point of view that these are some different methods.

Ahh thank you - it just finally clicked for me what 'raw' and 'timeseries' mean in pyglider world, so I now appreciate the importance of specifying 'raw' for such a method.

Basically - if there is a lot of commonality, then we could think about refactoring. You have looked at it more carefully than I have though - if you really think it would be shorter to have the if/else loops as you've done here we should continue the discussion.

There would be some duplication . However, after really looking at it I don't think it would be unreasonable as long as this chunk was factored out, ie was its own function that could be called by both binary_to_timeseries and binary_to_raw_timeseries (and maybe raw_to_timeseries as well). Name proposal timeseries_meta.?

If this all feels better to you, I'd be happy to take a stab at it? And if so, would you prefer that I update this pr or make a new one?

smwoodman · 2024-08-13T14:51:19Z

Yeah, we are discussing factoring out the profile creator. If needed we can quickly have a "skip profiles" flag that would work as well.

No need to change this quickly imo. To clarify, is part of the refactoring considering adding an argument to get_profile_new that would let the user specify which data variable to use to identify profiles, eg 'pressure' or 'm_depth'? And if not would you be open to me opening a new issue for this?

smwoodman · 2024-11-06T20:56:31Z

Belatedly closing this pr, as we've moved forward with creating multiple (engineering and science) NetCDF files. Thanks for all of the thoughtful discussion.

add first pass at time_base='union' method. also change to np.nan per…

7ba2560

… numpy 2.0 NaN error

jklymak reviewed Jun 24, 2024

View reviewed changes

Sam Woodman - NOAA and others added 2 commits June 25, 2024 15:42

reverting nan to NaN for pr. default sensor method first. starting do…

5900afa

…c edits

Update time_base docs

0f87e16

Co-authored-by: Jody Klymak <[email protected]>

no interpolation of engineering values for consistency

f13d18f

smwoodman and others added 3 commits July 17, 2024 12:50

Merge branch 'return-nan' into main

5d6bb64

Merge pull request #1 from smwoodman/main

fdfba17

merge pyglider updates into pull request

merge times as integers, rather than seconds

f4cee71

smwoodman and others added 4 commits July 17, 2024 17:25

fix and simplify docs for union method

cf04a26

Merge branch 'c-proof:main' into return-nan

c0d9861

Update docs. Add error check for if no eng or sci variables.

45490b5

Merge branch 'c-proof:main' into return-nan

cc94319

smwoodman mentioned this pull request Aug 27, 2024

Base glider processing choices SWFSC/glider-utils#4

Open

smwoodman closed this Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time_base='union' method #171

time_base='union' method #171

smwoodman commented Jun 23, 2024

jklymak left a comment

jklymak Jun 24, 2024

jklymak Jun 24, 2024

jklymak Jun 24, 2024

jklymak Jun 24, 2024

jklymak Jul 17, 2024

smwoodman Jul 18, 2024

jklymak Jun 24, 2024

smwoodman commented Jun 25, 2024

smwoodman commented Jun 25, 2024 •

edited

Loading

smwoodman commented Jul 16, 2024

jklymak commented Jul 17, 2024

jklymak commented Jul 17, 2024

smwoodman commented Jul 17, 2024

smwoodman commented Aug 9, 2024

jklymak commented Aug 10, 2024

smwoodman commented Aug 13, 2024

jklymak commented Aug 13, 2024

smwoodman commented Aug 13, 2024

smwoodman commented Aug 13, 2024

smwoodman commented Nov 6, 2024

time_base='union' method #171

time_base='union' method #171

Conversation

smwoodman commented Jun 23, 2024

jklymak left a comment

Choose a reason for hiding this comment

jklymak Jun 24, 2024

Choose a reason for hiding this comment

jklymak Jun 24, 2024

Choose a reason for hiding this comment

jklymak Jun 24, 2024

Choose a reason for hiding this comment

jklymak Jun 24, 2024

Choose a reason for hiding this comment

jklymak Jul 17, 2024

Choose a reason for hiding this comment

smwoodman Jul 18, 2024

Choose a reason for hiding this comment

jklymak Jun 24, 2024

Choose a reason for hiding this comment

smwoodman commented Jun 25, 2024

smwoodman commented Jun 25, 2024 • edited Loading

smwoodman commented Jul 16, 2024

jklymak commented Jul 17, 2024

jklymak commented Jul 17, 2024

smwoodman commented Jul 17, 2024

smwoodman commented Aug 9, 2024

jklymak commented Aug 10, 2024

smwoodman commented Aug 13, 2024

jklymak commented Aug 13, 2024

smwoodman commented Aug 13, 2024

smwoodman commented Aug 13, 2024

smwoodman commented Nov 6, 2024

smwoodman commented Jun 25, 2024 •

edited

Loading