-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to increase the speed of saving ERA5 chunks data? #65
Comments
subset = ds[var].sel(latitude=latitude_slice, longitude=longitude_slices)
path = osp.join(out_dir, f"{var}_{region}.zarr")
mode = 'a' if osp.exists(path) and os.listdir(path) else 'w'
subset.to_zarr(path, mode=mode, compute=True) I'm trying something similar, but noticing a very slow increase in my output zarr footprint, 0.5MB/s at best. |
The problem is that the data is stored in a way that only makes it efficient to access all locations at once. If you slice out a small area and load all times, you are effectively loading data for the entire globe. To fix this, you could use a tool like "rechunker" to convert these arrays into a format that allows for efficient queries across time: https://medium.com/pangeo/rechunker-the-missing-link-for-chunked-array-analytics-5b2359e9dc11 |
Could you provide an example of the optimal way to do this? Lets say I just need data at one latitude/longitude/level, but for the entire record. |
Hi, everyone. It is really convenient to access ERA5 data from cloud storage. However, it's very slow to save the processed data as netcdf format. It has taken 40 minutes so far and still has not been saved successfully. How can I solve this problem and increase the speed of saving ERA5 chunks data?This is my code.
The text was updated successfully, but these errors were encountered: