-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up loading of 3D data #10
Comments
There is a lot of room for optimisation in terms of reusing of the data, and during the design phase I had to make a choice between simple atomisation of the analysis (so each figure in the notebook can be created without additional data dependencies) and speed. Currently I would prefer to have additional versions of diagnostics (say with HR index or something similar), that would load/compute the data once and then perform all data analysis (you don't want to compute data mean 30 times for high resolution data). So I would not overcomplicate the logic of fdiag itself, rather play around with template notebooks. I don't know of a good way to speed up the NetCDF loading speed except converting them to zarr :) |
Update. I was writing on the paper and the waiting times of 15+ min for Hovmöller diagrams were too long for me. Here are some data loading experiments. I'm reading in years 1850 to 2529, so 680 years in total on core2 mesh. Volume of data on disk: 21,7 GB in total. First the way it's done so far:
Result: 1066.9s = 1,56s/year I tried what would happen if I use CDO and remap the data to 1° on the fly while loading. I also do the fldmean and yearmean already, as needed for the hovmöller diagrams. One could select a region here as well if needed:
Result: 1974,6s = 2.90s/year -> The CDO serial performance including the remapping is only twice as slow as xarray multifile load! If we would remap our data once externally with script we might already be competitive here. I then used dask:
Result: 316.6s = 0.46s/year We are three times faster with dask + cdo than with xarray mf. For things like Hovmöller diagrams where the 1° remapping done in the process is good enough we should problably use this method for now. For other plots where we need the native grid, we need to look into why loading data with xarray mf is so slow, that a crude dask + cdo + remap can be faster. |
I also think this might be better placed on the pyfesom2 issue tracker. If you agree, here is how to move it (in theory, i've never tried this) https://hub.packtpub.com/github-now-allows-issue-transfer-between-repositories-a-public-beta-version/ |
The
So in essence if you feel like having some different version of Hovmöller diagrams notebook is beneficial for you, just go ahead and try to implement it as a separate notebook, with separate call in configuration. We are not so many users right now and free to experiment and change things, and at the end the best solution should stick. |
Hey Nikolay, thanks for the feedback. I will try to make the Hovmöller diagnostic based on cdo loading suitable for general use then. I agree that pyfesom2 should be specialized on working with the native grid. So perhaps this is actually well placed in fdiag here. Cheers, Jan |
Could weighting be a problem? |
Maybe, but I use the normal method, after making them with genycon |
To be honest I like your figure better, as this looks like more classical warming pattern :) Consider a chance that it's me doing something very stupid with pyfesom2 :) |
I will say that the plot on the left looks unrealistic in the sense that the apparent error below 5000 meters is quite large in comparison to the absolute °C we have down there. I just did something very simple. I compare the weighted means obtained by doing the remapping in on the comandline:
I can then use aux3d.out and get the depth levels for each of these layers. I'll be looking at the layer where these two plots have the largest difference, which is the 5400m depth level. the left plot has it at ~1° warm bias, the right plot at ~0.1°. 5400m is level number 4 from the bottom:
This at least confirms that the phython cdo interface behaves like the comandline cdo does, and it's not some kinda of error I made in my python script. The only question left is whether the weights are correct. Maybe you can quickly check my genycon command, but I don't see how that could be wrong: |
Well, the easy way to check the bias at 5000 (rather 4000) is to plot spatial distribution maps of biases, and on my plots it's pretty large. I would be happy if you find a mistake though :) |
Maybe all the empty cells that are cut off by topology at depth get factored by as zeros for the fld mean? In that case it might be a question of not having set misvals correctly. |
I solved the remaining issue by adding: The cdo based plots now look nearly identical. The time to load increased from 316.6s to 338.1s. Still much faster than the 1066.9s we got with xarray. I will work on making this template merge ready tomorrow. |
Thanks for sending me the link to this Jan. I will look hopefully tomorrow. Dask-friendliess is on my list anyway. |
I recently ran a set of fairly comprehensive set of diagnostics for a 700 year long CORE2 mesh run. It took quite a few hours to complete:
Most of the time is spend loading data. When we are creating global diagnostics and then local ones afterwards we read the temp and salt data in multiple times. We also read in the same data for
hovm_difference_clim
as we do for theocean_integrals_difference
. Reading 700 years of a 3D CORE2 field once takes about 20 minutes. Thats 22.4 GB loading with 18.6 MB/s.The text was updated successfully, but these errors were encountered: