Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try running the GEAR pipeline with dask/spark directly #5

Closed
mattjbr123 opened this issue Oct 1, 2024 · 3 comments
Closed

Try running the GEAR pipeline with dask/spark directly #5

mattjbr123 opened this issue Oct 1, 2024 · 3 comments

Comments

@mattjbr123
Copy link
Collaborator

On JASMIN to start with

@mattjbr123
Copy link
Collaborator Author

The Direct Runner seems to work fine on JASMIN LOTUS

@mattjbr123
Copy link
Collaborator Author

@iwalmsley has a script for converting and rechunking multi-TB datasets to zarr with dask on JASMIN LOTUS, probably worth a look
https://gitlab.ceh.ac.uk/zarr-data-access/zarr-conversion/-/blob/main/wrf/dask_slurm_1960.py?ref_type=heads

@mattjbr123 mattjbr123 transferred this issue from NERC-CEH/object_store_tutorial Oct 7, 2024
@mattjbr123
Copy link
Collaborator Author

mattjbr123 commented Nov 7, 2024

We've decided not do this and to use Beam via pangeo-forge-recipes. The question that remains is which Beam Runner to use (dask, spark or flink). The dask runner is the newest and therefore likely quite buggy, but we'll try it first before exploring the other options if need be.

#31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant