Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Xarray zarr persistence support #3205

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ljstrnadiii
Copy link
Contributor

@ljstrnadiii ljstrnadiii commented Mar 21, 2025

Why are the changes needed?

There is currently no support for xarray object types.

What changes were proposed in this pull request?

This plugin allows us to persist the data to zarr and trigger computation (on the local cluster or dask cluster if used). This is powerful when combined with the dask cluster and takes away the need for a user to manually pass around a path to a zarr store.

How was this patch tested?

On a remote flyte cluster.

Setup process

Screenshots

Screenshots of inputs/outputs when deck is enabled:
Screenshot 2025-03-20 at 7 22 31 PM

Screenshot 2025-03-20 at 7 22 25 PM

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This PR adds support for persisting xarray objects using zarr format, extending Flytekit's capabilities for scientific data handling. It implements type transformers for xarray datasets and data arrays, configures the package via a dedicated setup file, and updates the global plugin registry. Comprehensive tests validate the functionality in workflows.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 2

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Code Review Agent Run #ea6571

Actionable Suggestions - 2
  • plugins/flytekit-xarray-zarr/flytekitplugins/xarray_zarr/xarray_transformers.py - 2
    • Runtime imports in type-checking block · Line 16-23
    • Incorrect variable names in lazy imports · Line 19-20
Review Details
  • Files reviewed - 5 · Commit Range: c3ce0a5..fbb6b88
    • plugins/flytekit-xarray-zarr/flytekitplugins/xarray_zarr/__init__.py
    • plugins/flytekit-xarray-zarr/flytekitplugins/xarray_zarr/xarray_transformers.py
    • plugins/flytekit-xarray-zarr/setup.py
    • plugins/flytekit-xarray-zarr/tests/test_xarray_zarr_plugin.py
    • plugins/setup.py
  • Files skipped - 2
    • .github/workflows/pythonbuild.yml - Reason: Filter setting
    • plugins/flytekit-xarray-zarr/README.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@ljstrnadiii
Copy link
Contributor Author

@bstadlbauer figured I would ping you here since you developed/maintained the dask cluster! Anything else you would want to see here?

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Documentation - Plugin Initialization and Documentation

__init__.py - Added module-level documentation and established core imports for the Xarray Zarr plugin.

New Feature - Xarray Zarr Plugin Implementation

xarray_transformers.py - Introduces new type transformers for handling xarray datasets and data arrays with zarr persistence.

setup.py - Sets up package configuration for the Xarray Zarr plugin.

Testing - Plugin Functionality Tests

test_xarray_zarr_plugin.py - Adds tests validating the persistency and transformation logic for xarray objects.

Other Improvements - Global Plugin Registry Update

setup.py - Updates global plugin mappings to include the Xarray Zarr plugin.

Comment on lines +16 to +23
import xarray as xr
from dask.distributed import Client
else:
pandas = lazy_module("xarray")
pyarrow = lazy_module("dask.distributed")


class XarrayZarrTypeTransformer(TypeTransformer[xr.Dataset]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime imports in type-checking block

The imports for xarray and dask.distributed.Client are inside a type-checking block but are used at runtime. Consider moving these imports outside the type-checking block.

Code suggestion
Check the AI-generated fix before applying
Suggested change
import xarray as xr
from dask.distributed import Client
else:
pandas = lazy_module("xarray")
pyarrow = lazy_module("dask.distributed")
class XarrayZarrTypeTransformer(TypeTransformer[xr.Dataset]):
pass
else:
pass
import xarray as xr
from dask.distributed import Client
class XarrayZarrTypeTransformer(TypeTransformer[xr.Dataset]):

Code Review Run #ea6571


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Will do.

Comment on lines +19 to +20
pandas = lazy_module("xarray")
pyarrow = lazy_module("dask.distributed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect variable names in lazy imports

There seems to be a mismatch in the lazy module imports. The variable names don't match the imported modules. pandas is used for xarray and pyarrow for dask.distributed. This could lead to confusion and potential issues when these modules are accessed.

Code suggestion
Check the AI-generated fix before applying
Suggested change
pandas = lazy_module("xarray")
pyarrow = lazy_module("dask.distributed")
xarray = lazy_module("xarray")
dask_distributed = lazy_module("dask.distributed")

Code Review Run #ea6571


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link

codecov bot commented Mar 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.53%. Comparing base (45d5531) to head (95fefdd).
Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3205       +/-   ##
===========================================
- Coverage   94.35%   78.53%   -15.82%     
===========================================
  Files          64      329      +265     
  Lines        2799    27168    +24369     
  Branches        0     2920     +2920     
===========================================
+ Hits         2641    21337    +18696     
- Misses        158     5028     +4870     
- Partials        0      803      +803     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Code Review Agent Run #e040ba

Actionable Suggestions - 1
  • plugins/flytekit-xarray-zarr/flytekitplugins/xarray_zarr/xarray_transformers.py - 1
Review Details
  • Files reviewed - 5 · Commit Range: ccc8c39..95fefdd
    • plugins/flytekit-xarray-zarr/flytekitplugins/xarray_zarr/__init__.py
    • plugins/flytekit-xarray-zarr/flytekitplugins/xarray_zarr/xarray_transformers.py
    • plugins/flytekit-xarray-zarr/setup.py
    • plugins/flytekit-xarray-zarr/tests/test_xarray_zarr_plugin.py
    • plugins/setup.py
  • Files skipped - 2
    • .github/workflows/pythonbuild.yml - Reason: Filter setting
    • plugins/flytekit-xarray-zarr/README.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

python_val: xr.DataArray,
expected_python_type: LiteralType,
) -> str:
assert isinstance(python_val, (xr.DataArray, xr.DataArray))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible typo in assertion statement

There appears to be a typo in the assertion statement. The second xr.DataArray in the tuple should likely be something else, possibly xr.Dataset since you're checking if python_val is an instance of either xr.DataArray or xr.Dataset. This would be consistent with the assertion on line 61.

Code suggestion
Check the AI-generated fix before applying
Suggested change
assert isinstance(python_val, (xr.DataArray, xr.DataArray))
assert isinstance(python_val, (xr.DataArray, xr.Dataset))

Code Review Run #e040ba


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will fix.

@ljstrnadiii ljstrnadiii marked this pull request as draft March 21, 2025 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants