Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch feature detection to use xarray internally #354

Merged

Conversation

freemansw1
Copy link
Member

@freemansw1 freemansw1 commented Oct 11, 2023

This is a draft PR that contains my work switching feature detection to use xarray internally, using many of the concepts (and some of the code) from #275. This code uses entirely xarray and pandas for processing feature detection, and removes all iris calls; instead using our internal conversion utilities to convert input iris data to xarray (which should always work, unlike xarray->iris. Further, this change should allow users to provide their data as an xarray.DataArray.

This PR will remain in draft form until at least #293 is merged and intentionally includes those changes.

More tests (particularly around times on different OS versions) should be added, but I wanted to get this into a form that we could discuss.

  • Have you followed our guidelines in CONTRIBUTING.md?
  • Have you self-reviewed your code and corrected any misspellings?
  • Have you written documentation that is easy to understand?
  • Have you written descriptive commit messages?
  • Have you added NumPy docstrings for newly added functions?
  • Have you formatted your code using black?
  • If you have introduced a new functionality, have you added adequate unit tests?
  • Have all tests passed in your local clone?
  • If you have introduced a new functionality, have you added an example notebook?
  • Have you kept your pull request small and limited so that it is easy to review?
  • Have the newest changes from this branch been merged?

# Conflicts:
#	tobac/feature_detection.py
…_xarray_work_fd_julia_avgs

# Conflicts:
#	tobac/feature_detection.py
#	tobac/utils/__init__.py
#	tobac/utils/general.py
…_xarray_work_fd_julia_avgs

# Conflicts:
#	tobac/utils/__init__.py
#	tobac/utils/general.py
@freemansw1 freemansw1 added the xarray transition Part of the transition to xarray support label Oct 11, 2023
…y_work_fd_julia_avgs

# Conflicts:
#	tobac/utils/general.py
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Attention: Patch coverage is 78.20513% with 68 lines in your changes missing coverage. Please review.

Project coverage is 62.27%. Comparing base (5f82097) to head (bbb6b67).
Report is 46 commits behind head on RC_v1.6.0.

Files with missing lines Patch % Lines
tobac/utils/internal/iris_utils.py 63.63% 52 Missing ⚠️
tobac/utils/internal/xarray_utils.py 88.88% 14 Missing ⚠️
tobac/utils/general.py 85.71% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##           RC_v1.6.0     #354      +/-   ##
=============================================
+ Coverage      60.91%   62.27%   +1.36%     
=============================================
  Files             23       23              
  Lines           3541     3690     +149     
=============================================
+ Hits            2157     2298     +141     
- Misses          1384     1392       +8     
Flag Coverage Δ
unittests 62.27% <78.20%> (+1.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@freemansw1 freemansw1 added this to the Version 1.6 milestone Nov 13, 2023
@JuliaKukulies
Copy link
Member

JuliaKukulies commented Mar 22, 2024

Great, thanks for making the revisions. I've had a look through the changes and tested the code with both xarray and iris locally and have no issues to issues. Should we start an RC_v1.6.0 branch to merge this into?

Yes, we will need to do that. I'll wait until we have to though to minimize the amount of time we have to maintain separate branches. We still need one more reviewer, @JuliaKukulies or @kelcyno are you up for it?

I can do the second review within the next couple of days!

I also started a RC_v1.6.0 branch to merge this PR into. We can wait until the release of v1.5.3 to update the branch before this PR is merged. But I will try to review asap :)

@JuliaKukulies JuliaKukulies changed the base branch from RC_v1.5.x to RC_v1.6.0 March 22, 2024 16:34
…y_work_fd_julia_avgs

# Conflicts:
#	tobac/analysis.py
#	tobac/feature_detection.py
Copy link
Member

@JuliaKukulies JuliaKukulies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this pR @freemansw1! This is seems like a great step towards the xarray transition.

It all looks good to me besides two small comments. And I have tested this on two real data sets which works fine and without any major issues. However, the only thing that does not seem to work is is passing the argument preserve_iris_datetime_type to feature_detection_multithreshold(). The dtypes for time are different in the output feature dataframe no matter if I set preserve_iris_datetime_type to False or True. They are always objects/dtype('O') when I input an iris cube and datetime64/dtype('<M8[ns]') when I input an xarray.

tobac/analysis.py Outdated Show resolved Hide resolved
tobac/utils/bulk_statistics.py Outdated Show resolved Hide resolved
@freemansw1
Copy link
Member Author

Ok, @JuliaKukulies, I think I've addressed your concerns. Ready for your re-review. Hopefully we can get this merged!

@JuliaKukulies
Copy link
Member

Excellent, @freemansw1! This looks good except for that there is still the issue with preserve_iris_datetime_type which does not result in the same datetime format as when I provide iris input when this parameters is True (see my last comment).

Sorry I had not marked the bug at the actual code locations, so I get that you might have overseen this. I also cannot really figure out where this bug comes from, since everywhere in the code where preserve_iris_datetime_type is used looks like it should work. Could you reproduce this or is this an issue with my test data?

@freemansw1
Copy link
Member Author

Excellent, @freemansw1! This looks good except for that there is still the issue with preserve_iris_datetime_type which does not result in the same datetime format as when I provide iris input when this parameters is True (see my last comment).

Sorry I had not marked the bug at the actual code locations, so I get that you might have overseen this. I also cannot really figure out where this bug comes from, since everywhere in the code where preserve_iris_datetime_type is used looks like it should work. Could you reproduce this or is this an issue with my test data?

Apologies, yeah, I had missed this. I will work to reproduce and try this out this weekend/early next week.

@w-k-jones
Copy link
Member

w-k-jones commented May 13, 2024

I think I've noticed this before, and the issue is to do with calculating the timesteps from cftime format. I think that in the current versions of tobac the time is also returned with an object dtype, but I would be happier to change it to datetime64 in all cases as I currently have to do this for most use cases...

@freemansw1
Copy link
Member Author

Note to self: add_coords for xarray fails when having coordinates to interpolate along dimensions other than x/y.

@freemansw1
Copy link
Member Author

OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically, xarray only converts to datetime64 if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).

@freemansw1
Copy link
Member Author

Note to self: add_coords for xarray fails when having coordinates to interpolate along dimensions other than x/y.

I have now also resolved this issue. The solution here results in some decisions that were contrary to the discussion we had at the last dev meeting (notably that len(1) coordinates are still appended to the dataframe), but I think we are otherwise okay. We can talk about this at the dev meeting tomorrow.

@freemansw1
Copy link
Member Author

@w-k-jones and @JuliaKukulies I am now ready for a re-review.

@JuliaKukulies
Copy link
Member

OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically, xarray only converts to datetime64 if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).

OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:

  • for xarray input and preserve_iris_datetime_type = True -> datetime64

  • for xarray input and preserve_iris_datetime_type = False -> datetime64

  • for iris input and preserve_iris_datetime_type = True -> object (cftime.DatetimeGregorian)

  • for iris input and preserve_iris_datetime_type = False -> object (cftime.DatetimeGregorian)

Is that what we want?

@freemansw1
Copy link
Member Author

OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically, xarray only converts to datetime64 if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).

OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:

  • for xarray input and preserve_iris_datetime_type = True -> datetime64
  • for xarray input and preserve_iris_datetime_type = False -> datetime64
  • for iris input and preserve_iris_datetime_type = True -> object (cftime.DatetimeGregorian)
  • for iris input and preserve_iris_datetime_type = False -> object (cftime.DatetimeGregorian)

Is that what we want?

Ok, the first of those three are all correct, but the last one isn't.

@JuliaKukulies
Copy link
Member

OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically, xarray only converts to datetime64 if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).

OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:

  • for xarray input and preserve_iris_datetime_type = True -> datetime64
  • for xarray input and preserve_iris_datetime_type = False -> datetime64
  • for iris input and preserve_iris_datetime_type = True -> object (cftime.DatetimeGregorian)
  • for iris input and preserve_iris_datetime_type = False -> object (cftime.DatetimeGregorian)

Is that what we want?

Ok, the first of those three are all correct, but the last one isn't.

Thats what I thought, and we would expect datetime64 for the last one, right? Have you tested this with the data I sent you or another dataset? Sorry to cause more struggle..

@freemansw1
Copy link
Member Author

OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically, xarray only converts to datetime64 if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).

OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:

  • for xarray input and preserve_iris_datetime_type = True -> datetime64
  • for xarray input and preserve_iris_datetime_type = False -> datetime64
  • for iris input and preserve_iris_datetime_type = True -> object (cftime.DatetimeGregorian)
  • for iris input and preserve_iris_datetime_type = False -> object (cftime.DatetimeGregorian)

Is that what we want?

Ok, the first of those three are all correct, but the last one isn't.

Thats what I thought, and we would expect datetime64 for the last one, right? Have you tested this with the data I sent you or another dataset? Sorry to cause more struggle..

Ok this one was easy- we didn't actually expect the kwarg in feature_detection_multithreshold. I've fixed that.

@JuliaKukulies
Copy link
Member

OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically, xarray only converts to datetime64 if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).

OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:

  • for xarray input and preserve_iris_datetime_type = True -> datetime64
  • for xarray input and preserve_iris_datetime_type = False -> datetime64
  • for iris input and preserve_iris_datetime_type = True -> object (cftime.DatetimeGregorian)
  • for iris input and preserve_iris_datetime_type = False -> object (cftime.DatetimeGregorian)

Is that what we want?

Ok, the first of those three are all correct, but the last one isn't.

Thats what I thought, and we would expect datetime64 for the last one, right? Have you tested this with the data I sent you or another dataset? Sorry to cause more struggle..

Ok this one was easy- we didn't actually expect the kwarg in feature_detection_multithreshold. I've fixed that.

Perfect, thanks for looking into this and fixing this. I am happy to approve this now once the latest changes from RC_v1.5.x are merged in

@freemansw1
Copy link
Member Author

OK, changes all merged in and everything. @JuliaKukulies and @w-k-jones are you happy for me to merge into RC_v1.6.0?

@JuliaKukulies
Copy link
Member

OK, changes all merged in and everything. @JuliaKukulies and @w-k-jones are you happy for me to merge into RC_v1.6.0?

Yes, please go ahead:) !

@w-k-jones
Copy link
Member

Also happy for this to be merged :)

@freemansw1 freemansw1 merged commit 0ebee82 into tobac-project:RC_v1.6.0 Jul 25, 2024
21 checks passed
@freemansw1 freemansw1 mentioned this pull request Jul 26, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
xarray transition Part of the transition to xarray support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants