Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rust geomedian impl #14

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from
Draft

Add rust geomedian impl #14

wants to merge 22 commits into from

Conversation

omad
Copy link
Member

@omad omad commented Aug 14, 2024

This is very work in progress, and I actually wish I'd taken a different approach.

The intention was to take a NaN aware, Python/NumPy Geomedian implementation written in Rust several years ago, and merge it into odc-algo as a seamless replacement for the hdstats Cython implementation.

The hdstats implementation works well, but the published source code doesn't match the version that is being used within DEA. binaries source

Existing Implementation
The code in this repo handles taking a dask/xarray object, partitioning and reshaping it, and then calling hdstats. As far as I can tell, it's only used from odc-stats. This makes it all a bit annoying to test and work on, odc-stats -> odc-stats GM Plugin -> odc-algo -> hdstats

This Work
Instead of modifying what's here, I wish I'd added a new Python module for calling the new Rust code, along with a new plugin within odc-stats. This would have made testing and comparing the implementations much easier, and is what I'd recommend doing rather than merging this PR as it stands.

omad and others added 22 commits September 4, 2023 12:39
…oviding **kw, remove xr_geomedian and int_geomedian
Because this Rust implementation computes both geomedian
and MADs before returning to Python. The geomedian result
can be either floats or ints, and the MADs results are always
floats. Since you can't store both floats and ints in the same
array it returns two arrays.

This required a pretty major rejiggering of the geomedian_with_mads
function. Moving most of the logic for getting the data out of the
xr.Dataset in the correct shape and axis ordering, and then putting it
back into an xr.Dataset on completion.

I think this leaves a whole chunk of code which can be deleted, but
need to do more testing first.
From [1], maturin is a more modern and opinionated build and packaging
tool for building combined Rust + Python packages.

[1]: https://pyo3.rs/v0.13.2/building_and_distribution.html#distribution
@omad omad requested a review from JM-GA August 16, 2024 03:44
@omad omad assigned omad and JM-GA Aug 16, 2024
@robbibt
Copy link
Contributor

robbibt commented Oct 11, 2024

Hey @omad, just saw this bit:

As far as I can tell, it's only used from odc-stats.

We make use of xr_geomedian and int_geomedian from this repo throughout DEA Notebooks and in our upcoming DEA Intertidal Composites products - with this proposal ("I wish I'd added a new Python module for calling the new Rust code, along with a new plugin within odc-stats") would we still be able to do that?

@omad
Copy link
Member Author

omad commented Oct 11, 2024

Hi @robbibt

We make use of xr_geomedian and int_geomedian from this repo throughout DEA Notebooks and in our upcoming DEA Intertidal Composites products - with this proposal ("I wish I'd added a new Python module for calling the new Rust code, along with a new plugin within odc-stats") would we still be able to do that?

Absolutely! I made this PR for visibility, not to merge.

I'm going to create a new PR, that exposes the new code from a different Python module, and leave the current modules as they are.

@robbibt
Copy link
Contributor

robbibt commented Oct 11, 2024

Awesome - we'd love to switch to the new code though - perhaps we can create equivalent funcs in the new module instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants