-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Xarray zarr persistence support #3205
base: master
Are you sure you want to change the base?
Conversation
Code Review Agent Run #ea6571Actionable Suggestions - 2
Review Details
|
Signed-off-by: Len Strnad <[email protected]>
Signed-off-by: Len Strnad <[email protected]>
Signed-off-by: Len Strnad <[email protected]>
Signed-off-by: Len Strnad <[email protected]>
Signed-off-by: Len Strnad <[email protected]>
Signed-off-by: Len Strnad <[email protected]>
Signed-off-by: Len Strnad <[email protected]>
Signed-off-by: Len Strnad <[email protected]>
fbb6b88
to
95fefdd
Compare
@bstadlbauer figured I would ping you here since you developed/maintained the dask cluster! Anything else you would want to see here? |
Changelist by BitoThis pull request implements the following key changes.
|
import xarray as xr | ||
from dask.distributed import Client | ||
else: | ||
pandas = lazy_module("xarray") | ||
pyarrow = lazy_module("dask.distributed") | ||
|
||
|
||
class XarrayZarrTypeTransformer(TypeTransformer[xr.Dataset]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imports for xarray
and dask.distributed.Client
are inside a type-checking block but are used at runtime. Consider moving these imports outside the type-checking block.
Code suggestion
Check the AI-generated fix before applying
import xarray as xr | |
from dask.distributed import Client | |
else: | |
pandas = lazy_module("xarray") | |
pyarrow = lazy_module("dask.distributed") | |
class XarrayZarrTypeTransformer(TypeTransformer[xr.Dataset]): | |
pass | |
else: | |
pass | |
import xarray as xr | |
from dask.distributed import Client | |
class XarrayZarrTypeTransformer(TypeTransformer[xr.Dataset]): |
Code Review Run #ea6571
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Will do.
pandas = lazy_module("xarray") | ||
pyarrow = lazy_module("dask.distributed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a mismatch in the lazy module imports. The variable names don't match the imported modules. pandas
is used for xarray
and pyarrow
for dask.distributed
. This could lead to confusion and potential issues when these modules are accessed.
Code suggestion
Check the AI-generated fix before applying
pandas = lazy_module("xarray") | |
pyarrow = lazy_module("dask.distributed") | |
xarray = lazy_module("xarray") | |
dask_distributed = lazy_module("dask.distributed") |
Code Review Run #ea6571
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3205 +/- ##
===========================================
- Coverage 94.35% 78.53% -15.82%
===========================================
Files 64 329 +265
Lines 2799 27168 +24369
Branches 0 2920 +2920
===========================================
+ Hits 2641 21337 +18696
- Misses 158 5028 +4870
- Partials 0 803 +803 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code Review Agent Run #e040baActionable Suggestions - 1
Review Details
|
python_val: xr.DataArray, | ||
expected_python_type: LiteralType, | ||
) -> str: | ||
assert isinstance(python_val, (xr.DataArray, xr.DataArray)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the assertion statement. The second xr.DataArray
in the tuple should likely be something else, possibly xr.Dataset
since you're checking if python_val
is an instance of either xr.DataArray
or xr.Dataset
. This would be consistent with the assertion on line 61.
Code suggestion
Check the AI-generated fix before applying
assert isinstance(python_val, (xr.DataArray, xr.DataArray)) | |
assert isinstance(python_val, (xr.DataArray, xr.Dataset)) |
Code Review Run #e040ba
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will fix.
Why are the changes needed?
There is currently no support for xarray object types.
What changes were proposed in this pull request?
This plugin allows us to persist the data to zarr and trigger computation (on the local cluster or dask cluster if used). This is powerful when combined with the dask cluster and takes away the need for a user to manually pass around a path to a zarr store.
How was this patch tested?
On a remote flyte cluster.
Setup process
Screenshots
Screenshots of inputs/outputs when deck is enabled:

Check all the applicable boxes
Summary by Bito
This PR adds support for persisting xarray objects using zarr format, extending Flytekit's capabilities for scientific data handling. It implements type transformers for xarray datasets and data arrays, configures the package via a dedicated setup file, and updates the global plugin registry. Comprehensive tests validate the functionality in workflows.Unit tests added: True
Estimated effort to review (1-5, lower is better): 2