-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rust geomedian impl #14
base: main
Are you sure you want to change the base?
Conversation
Written in rust
…oviding **kw, remove xr_geomedian and int_geomedian
Because this Rust implementation computes both geomedian and MADs before returning to Python. The geomedian result can be either floats or ints, and the MADs results are always floats. Since you can't store both floats and ints in the same array it returns two arrays. This required a pretty major rejiggering of the geomedian_with_mads function. Moving most of the logic for getting the data out of the xr.Dataset in the correct shape and axis ordering, and then putting it back into an xr.Dataset on completion. I think this leaves a whole chunk of code which can be deleted, but need to do more testing first.
From [1], maturin is a more modern and opinionated build and packaging tool for building combined Rust + Python packages. [1]: https://pyo3.rs/v0.13.2/building_and_distribution.html#distribution
Hey @omad, just saw this bit:
We make use of |
Hi @robbibt
Absolutely! I made this PR for visibility, not to merge. I'm going to create a new PR, that exposes the new code from a different Python module, and leave the current modules as they are. |
Awesome - we'd love to switch to the new code though - perhaps we can create equivalent funcs in the new module instead |
This is very work in progress, and I actually wish I'd taken a different approach.
The intention was to take a NaN aware, Python/NumPy Geomedian implementation written in Rust several years ago, and merge it into odc-algo as a seamless replacement for the hdstats Cython implementation.
The
hdstats
implementation works well, but the published source code doesn't match the version that is being used within DEA. binaries sourceExisting Implementation
The code in this repo handles taking a dask/xarray object, partitioning and reshaping it, and then calling hdstats. As far as I can tell, it's only used from
odc-stats
. This makes it all a bit annoying to test and work on,odc-stats -> odc-stats GM Plugin -> odc-algo -> hdstats
This Work
Instead of modifying what's here, I wish I'd added a new Python module for calling the new Rust code, along with a new plugin within
odc-stats
. This would have made testing and comparing the implementations much easier, and is what I'd recommend doing rather than merging this PR as it stands.