-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xarray combine_by_coords return the monotonic global index error #4213
Comments
Hi @hamiddashti , based on your description then this isn't a bug, it's throwing the error it should be throwing given your input. However I can now see how the documentation doesn't make it very clear as to why this is happening!
That error is an explicit rejection of the input you gave it: "you gave me overlapping coordinates, I don't know how to concatenate those, so I'll reject them." Normally this makes sense - when the overlapping coordinates aren't NaNs then it's ambiguous as to which values to choose. In your case then you are asking it to perform a well-defined operation, and the discussion in the docs about merging overlapping coordinates here implies that It might be possible to generalise the (Issue #3150 was about something else, an actual bug in the handling of "coordinate dimensions which do not vary between each dataset".) Instead, what you need to do is trim off the overlap first. That shouldn't be hard - presumably you know (or can determine) how big your overlap is, and all your NaNs are on one dataset. You just need to use the Would it be clearer if I added an example to the documentation where some overlapping data had to be trimmed first? |
Thanks @TomNicholas for a thorough explanation. Now it makes sense. I thought this is a process like mosaicing rasters using rasterio.merge. Yes, it would be great if you can add an example on how to find and trim overlapping coordinates. I don't really know how to find common coordinates in spatial datasets using isel. Thanks |
No problem, thanks for flagging it.
Do you already know which data points overlap? You know they are NaNs, so do you know how many NaNs there are at the edge of your tiles? If you do then it's just like |
What I meant is that |
Thanks. The problem is I don't really know where the common coordinates are. However, since I know the extent of each tile, with some preprocessing I should be able to find/trim them. On the side note. It would be great if mosaicing dataset with common coordinate was added to the xarray (like mosacing rasters). What really happened was I had a large area, then I used xarray to clip it into smaller tiles to make calculations more feasible. After I did the calculations on tiles and decided to stitch them together this problem raised. Your approach would solve this issue, but having it as part of xarray definitely helps many geospatial applications where, for many reasons, we deal with tiles of data with common coordinates. |
Great. Let me know if you still have problems (on here, SO - I just answered your original question there, or on the xarray mailing list).
I wonder if you could have avoided having to do this by applying your analysis in chunks using dask? That might be complicated if your analysis is a complicated algorithm though.
This sounds like something that might be useful for lots of geoscientists, so it would be good to discuss this further. However, I don't really know exactly what you mean by "mosaicing rasters" (I don't work in geoscience). Briefly reading about it here it seems that there isn't one universal way to do it... What would be really great is if you could give me a more precise specification of the behaviour you're imagining, and how it would be used in practice (either here on in a new issue). Then we can see if it's (a) feasible, (b) commonly-useful, and (c) should live in xarray or another package. Another good place to ask about the best way to approach this problem in general would be the pangeo discourse. |
@TomNicholas I've landed on this discussion looking for a solution for what I consider the exact same problem. You are right in pointing that there are multiple ways to treat the overlapping values but I would stick with the most common one that is as well reported in the link you mentioned. In other words (min, max, average, first, last) would be already a huge plus. About dask, indeed is helping a lot to create a delayed object of the tiles (consider that at least for S2 data are in jp2 and we are forced to use open_rasterio instead of open_mfdataset) so the solution should be compatible with this kind of approach. About Pangeo, indeed a topic should be opened on it and eventually we can move there the discussion but, at least in my opinion, for the moment the right place to discuss is within xarray. Seems that Sinergise for the AWS service has used the average algorithm to solve the same issue. Seems that all the users that will use the AWS S2 Products will not need to care about the overlap issue. Edit: update on AWS service |
Hi @pl-marasco, thanks for your comment. So what you're suggesting is to alter I think that this could be done within the Internally the combine functions currently work by creating an intermediate representation of the arrangement of tiles, before combining that along 1D repeatedly until done. What I'm wondering is whether any treatment of overlapping values would need to happen before this 1D combining step? If I have 4 tiles which all overlap at a corner, and you want me to take the (max/min/average) value of all 4 in that quadruple overlap region, I could either do this by identifying that region and taking the max (complicated) or by simply updating the max value every time I combine along 1D (simple, but more wasteful). Separately, a treatment based on the order of the input passed (your first/last) would I think need to store extra information about that order, which would be more complicated. Do these raster problems always use the same sized tiles? |
@pl-marasco I've just been pointed to this issue on pangeo-data, which looks like a better place to discuss this. |
Closing this as having answered the original question. If anyone wants to discuss mosaicing rasters in more detail we can raise another issue for that. |
I asked this question on Stackoverflow and someone mentioned it might be a bug.
I am trying to combine two spatial xarray datasets using combine_by_coords. These two datasets are two tiles next to each other. So there are overlapping coordinates. In the overlapping regions, the variable values of one of the datasets is nan.
I used the "combine_by_coords" with compat='no_conflicts' option. However, it returns the monotonic global indexes along dimension y error. It looks like it was an issue before but it was fixed (issue 3150). So I don't really know why I get this error. Here is an example (the netcdf tiles are here):
The text was updated successfully, but these errors were encountered: