-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/region support #326
base: develop
Are you sure you want to change the base?
Conversation
This commit has advantages and disadvantages: Advantages ---------- - It does not require pooch, and only uses the standard library. - It can deal with URL's not ending with the filename. Disadvantages ------------- - It downloads the files locally instead of into the cache (may be actually good). - It does not add any checksum to the name, risking overwriting files.
It hangs if latitude and longitude are not coordinates
-The code was originally looking for data["domain_type"].cf == "domain_name" instead of data["domain_type"] == "domain_name". -auto-region:custom_box was wirtten as auto-region:custom.
Better tests are provided in the PR.
The test that is failing is the Copyright notice, which I wrote according to #321 |
Hi all, I would like to leverage on this PR for xarray_plots. Therefore, the sooner we can get this merged, the better. Let me know any edits that should be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I think all the original capabilities are still possible and the new options are useful too. The naming works for me. I just had one technical question and then I'm happy to approve if Zach @zmoon you also agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple small updates, I think and this is ready.
Co-authored-by: Becky Schwantes <[email protected]>
Co-authored-by: Becky Schwantes <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for making these updates!
correct spelling and typos Co-authored-by: Zachary Moon <[email protected]>
Add raise_for_status() Co-authored-by: Zachary Moon <[email protected]>
Make the content_disposition more robust if missing Co-authored-by: Zachary Moon <[email protected]>
boolean type hint to bool Co-authored-by: Zachary Moon <[email protected]>
Add space in docs Co-authored-by: Zachary Moon <[email protected]>
@blychs did you have any thoughts about these pooch comments from @DWesl?
|
I just answered under that topic. I think the idea is quite nice, but I don't find an easy way to make it work, because I believe that we need to have the extension of the file. Might be wrong though, or there might be a way to infer it. @DWesl do you have any input on this? If not, I would be happy to see this PR merged, but if there is an easy fix, then I'm also happy to move forward with it |
@blychs If you can merge upstream develop into your branch the file header check should pass, if you add an additional |
Done |
That's a pooch error then? In that case, parsing through You could create and update a mapping of URL to filename if you wanted to get back some of the caching pooch does, if you take steps to avoid filename collisions. (If you're not able to persist the mapping between script invocations, you could set up an If you are already using this as a fallback filename: if fname is None:
fname = url.rsplit("/", 1)[-1] changing to have You could also do this assignment unconditionally before the Content-Disposition check and overwrite it if the response indicates a preferred filename so |
Adds new region support in the form of a specific utility (util/region_select.py). The docs are updated accordingly.
This includes:
Old capabilities, but using xarray instead of pandas (and the
.where
method instead of a query). The query is still and option in the utility, but due to how I change the driver, I don't see how it would really be required by the code. xarray has a "query" method, but I couldn't make it work and I believe it requires the use of a dimension to query along, which I don't think we want.New, advanced region support.
auto-region
now includesauto-region:custom
, allowing the user to provide a lonlat box. It is currently somewhat limited, though, and it cannot cross the antimeridian. The box has to be provided in the new keyworddomain_info
.New, advanced region support with
regionmask
. These require the use of the new keyworddomain_info
. The new capabilities include:custom:auto_polygon
indomain_type
. Holes in the polygon are permitted by regionmask, but I have not added that capability yet (I am not sure about the best way to do it, since it would require turning things into another dictionary, I believe).regionmask
'sdefined_regions
method withcustom:defined_regions
. These need to be defined, once again, indomain_info
.custom:custom_file
. The path or URL of the file need to be defined indomain_info
. There are a few undesirable thing about how I did the automatic download, though, and if you have better suggestions I'd be happy to change the code:- I am not using
pooch
, but downloading the files manually. The reason for this is that I was not able to find a way to tellpooch
to use the content-disposition keyword of a URL when downloading, leading to errors when the URL did not end with the name of the file. I'd rather not make the user provide that, since it's not always certain. I'm sure that there must be a way, but I could not find it.- The code does not check if the file exists, and just overwrites it. If you ask for multiple tasks using the same domain, it downloads it again every time. This is silly, and could be avoided by using
pooch
, which would also test the checksum. Once again, I could not find an easy way to avoid this when the URL does not end with the appropriate file name, which happens quite often (for example, in my tests).If you have any solutions for this, I would appreciate them. Otherwise, I'd suggest moving forward with this.
Please check also the changes in the docs. My English is far from perfect and some proofreading (and corrections!) by people with a better English than me would be great.
I am uploading here the yaml file I used to test all of this, so that it can get tested again. I tested it against surface data. Although I didn't do a complete test against TEMPO (I didn't want to add changes to the driver until I fix the merge conflicts, hopefully tomorrow) I did test the individual functions and plotted the results. Testing against AEROMMA data for aircraft would be great. Otherwise, I can try it with ASIA-AQ. Since GitHub does not allow me
I also provide here my functions for testing individual options. I did not build proper unit tests, since just asserting whether the types of data are correct or not completely NaN does not seem as useful as looking at the plots.
You will see that I did not plot the multiboxplot nor the scorecards with the regions defined with regionmask. There is probably a way to do it (by adding the mask to the dataset and not only selecting the region, possibly in a copy to avoid changing it), but it seemed a little bit confusing.
test_regionmask_tool.zip
Cheers,
Pablo
Edit: I requested some reviewers that seemed reasonable, but feel free to change that.