Feature/region support #326

blychs · 2024-12-31T05:59:14Z

Adds new region support in the form of a specific utility (util/region_select.py). The docs are updated accordingly.
This includes:
Old capabilities, but using xarray instead of pandas (and the .where method instead of a query). The query is still and option in the utility, but due to how I change the driver, I don't see how it would really be required by the code. xarray has a "query" method, but I couldn't make it work and I believe it requires the use of a dimension to query along, which I don't think we want.

New, advanced region support.
auto-region now includes auto-region:custom, allowing the user to provide a lonlat box. It is currently somewhat limited, though, and it cannot cross the antimeridian. The box has to be provided in the new keyword domain_info.

New, advanced region support with regionmask. These require the use of the new keyword domain_info. The new capabilities include:

Defining one or more polygons with custom:auto_polygon in domain_type. Holes in the polygon are permitted by regionmask, but I have not added that capability yet (I am not sure about the best way to do it, since it would require turning things into another dictionary, I believe).
Using regionmask's defined_regions method with custom:defined_regions. These need to be defined, once again, in domain_info.
Using a shapefile/geojson with custom:custom_file. The path or URL of the file need to be defined in domain_info. There are a few undesirable thing about how I did the automatic download, though, and if you have better suggestions I'd be happy to change the code:
- I am not using pooch, but downloading the files manually. The reason for this is that I was not able to find a way to tell pooch to use the content-disposition keyword of a URL when downloading, leading to errors when the URL did not end with the name of the file. I'd rather not make the user provide that, since it's not always certain. I'm sure that there must be a way, but I could not find it.
- The code does not check if the file exists, and just overwrites it. If you ask for multiple tasks using the same domain, it downloads it again every time. This is silly, and could be avoided by using pooch, which would also test the checksum. Once again, I could not find an easy way to avoid this when the URL does not end with the appropriate file name, which happens quite often (for example, in my tests).

If you have any solutions for this, I would appreciate them. Otherwise, I'd suggest moving forward with this.
Please check also the changes in the docs. My English is far from perfect and some proofreading (and corrections!) by people with a better English than me would be great.

I am uploading here the yaml file I used to test all of this, so that it can get tested again. I tested it against surface data. Although I didn't do a complete test against TEMPO (I didn't want to add changes to the driver until I fix the merge conflicts, hopefully tomorrow) I did test the individual functions and plotted the results. Testing against AEROMMA data for aircraft would be great. Otherwise, I can try it with ASIA-AQ. Since GitHub does not allow me
I also provide here my functions for testing individual options. I did not build proper unit tests, since just asserting whether the types of data are correct or not completely NaN does not seem as useful as looking at the plots.

You will see that I did not plot the multiboxplot nor the scorecards with the regions defined with regionmask. There is probably a way to do it (by adding the mask to the dataset and not only selecting the region, possibly in a copy to avoid changing it), but it seemed a little bit confusing.
test_regionmask_tool.zip

Cheers,
Pablo

Edit: I requested some reviewers that seemed reasonable, but feel free to change that.

This commit has advantages and disadvantages: Advantages ---------- - It does not require pooch, and only uses the standard library. - It can deal with URL's not ending with the filename. Disadvantages ------------- - It downloads the files locally instead of into the cache (may be actually good). - It does not add any checksum to the name, risking overwriting files.

It hangs if latitude and longitude are not coordinates

-The code was originally looking for data["domain_type"].cf == "domain_name" instead of data["domain_type"] == "domain_name". -auto-region:custom_box was wirtten as auto-region:custom.

Better tests are provided in the PR.

blychs · 2024-12-31T06:16:45Z

The test that is failing is the Copyright notice, which I wrote according to #321

docs/appendix/yaml.rst

blychs · 2025-02-04T00:25:39Z

Hi all, I would like to leverage on this PR for xarray_plots. Therefore, the sooner we can get this merged, the better. Let me know any edits that should be done.
I would like to finalize xarray_plots as soon as possible. I've been checking and think that I might be able to use this to get rid of pandas for surface and satplots. I haven't looked at aircraftplots yet.

rschwant

This looks great! I think all the original capabilities are still possible and the new options are useful too. The naming works for me. I just had one technical question and then I'm happy to approve if Zach @zmoon you also agree?

melodies_monet/driver.py

rschwant

Just a couple small updates, I think and this is ready.

docs/users_guide/region_selection.rst

melodies_monet/util/region_select.py

Co-authored-by: Becky Schwantes <[email protected]>

rschwant

Looks great, thanks for making these updates!

melodies_monet/util/region_select.py

correct spelling and typos Co-authored-by: Zachary Moon <[email protected]>

Add raise_for_status() Co-authored-by: Zachary Moon <[email protected]>

melodies_monet/util/region_select.py

Make the content_disposition more robust if missing Co-authored-by: Zachary Moon <[email protected]>

boolean type hint to bool Co-authored-by: Zachary Moon <[email protected]>

Add space in docs Co-authored-by: Zachary Moon <[email protected]>

zmoon · 2025-02-13T18:17:08Z

@blychs did you have any thoughts about these pooch comments from @DWesl?

It doesn't look like pooch has a way to set the file name after the download has started, but you might be able to guess from the URL. Unfortunately, there's enough variation in how the URL encodes information about the resource it points to that a universal method of composing a simple filename obviously related to the URL is probably not possible.

Pooch's default involves a hash of the URL, would using just the hash without the guess at the filename work? It's not as obviously related to the URL if you want to find the downloaded file in pooch's cache, but the code will be able to reconstruct it easily. It might be worth adding some part of the domain/hostname in the hash to get a little association between URL and cached file name and to reduce the chance of filename collisions.

blychs · 2025-02-13T18:26:14Z

@blychs did you have any thoughts about these pooch comments from @DWesl?

It doesn't look like pooch has a way to set the file name after the download has started, but you might be able to guess from the URL. Unfortunately, there's enough variation in how the URL encodes information about the resource it points to that a universal method of composing a simple filename obviously related to the URL is probably not possible.
Pooch's default involves a hash of the URL, would using just the hash without the guess at the filename work? It's not as obviously related to the URL if you want to find the downloaded file in pooch's cache, but the code will be able to reconstruct it easily. It might be worth adding some part of the domain/hostname in the hash to get a little association between URL and cached file name and to reduce the chance of filename collisions.

I just answered under that topic. I think the idea is quite nice, but I don't find an easy way to make it work, because I believe that we need to have the extension of the file. Might be wrong though, or there might be a way to infer it.

@DWesl do you have any input on this? If not, I would be happy to see this PR merged, but if there is an easy fix, then I'm also happy to move forward with it

zmoon · 2025-02-13T18:28:48Z

@blychs If you can merge upstream develop into your branch the file header check should pass, if you add an additional # line (see the recent change).

blychs · 2025-02-13T18:35:10Z

Done

DWesl · 2025-02-18T11:56:01Z

I just answered under that topic. I think the idea is quite nice, but I don't find an easy way to make it work, because I believe that we need to have the extension of the file. Might be wrong though, or there might be a way to infer it.

That's a pooch error then? In that case, parsing through fmt=shp vs. format=shapefile vs. name=file.shp in the parameters vs an actual filename before the parameters would be a never-ending pain.

You could create and update a mapping of URL to filename if you wanted to get back some of the caching pooch does, if you take steps to avoid filename collisions. (If you're not able to persist the mapping between script invocations, you could set up an atexit handler to delete all the files when Python exits).

If you are already using this as a fallback filename:

if fname is None:
    fname = url.rsplit("/", 1)[-1]

changing to have fname = os.path.basename(urllib.parse.urlparse(url).path) might produce a normal filename in more cases, but it might also get rid of information you want to keep . I don't know how much of a concern weird non-descriptive filenames would be, but os.path.basename(url) might be a bit more clear about what you're doing if you want to keep the current approach. (Pooch's default would then be something like "-".join([hashlib.sha1(url.encode('utf-8')).hexdigest(), os.path.basename(urllib.parse.urlparse(url).path)]) or "-".join([hashlib.sha1(url.encode('utf-8')).hexdigest(), os.path.basename(url)]), which are less likely to overwrite other files from the same source, but if you always re-download the file that's less relevant)

You could also do this assignment unconditionally before the Content-Disposition check and overwrite it if the response indicates a preferred filename so fname always has a string value.

blychs added 17 commits November 20, 2024 18:19

region select for custom regions

ef2428b

Regionmask tool

da33919

Delete old unused code

7c53f29

Return masked_dataset directly

0432ace

test region support

85f9122

Add explicit parameter domain_info

629c700

Use domain_info for extra data

9a3a07c

deal with pandas, stil in testing

99acbb3

Control for region selection

330bc9d

Start adding test for regionsupport

3733a17

Eliminate drop=True in auto-region

725f967

It hangs if latitude and longitude are not coordinates

Fix typo in import region_select for stats

895c468

Fix bug in query-like search and typo in auto-region:custom_box

eb2f820

-The code was originally looking for data["domain_type"].cf == "domain_name" instead of data["domain_type"] == "domain_name". -auto-region:custom_box was wirtten as auto-region:custom.

Update the docs to new region support

15a680a

remove changes from surfplots.py

76d86df

Remove test_regionmask.py, which was incomplete.

f467367

Better tests are provided in the PR.

blychs requested review from rrbuchholz, zmoon and rschwant December 31, 2024 06:01

blychs added 2 commits December 30, 2024 23:03

Add region_selection.rst to docs

0a7bd17

Add Copyright notice

3de40f2

blychs mentioned this pull request Jan 2, 2025

Add (optional) arbitrary region support with regionmask #272

Open

blychs requested review from dwfncar and removed request for rrbuchholz January 13, 2025 18:33

zmoon reviewed Jan 13, 2025

View reviewed changes

docs/appendix/yaml.rst Outdated Show resolved Hide resolved

zmoon reviewed Jan 13, 2025

View reviewed changes

docs/appendix/yaml.rst Outdated Show resolved Hide resolved

Change names of region types

c99929c

rschwant reviewed Feb 5, 2025

View reviewed changes

melodies_monet/driver.py Outdated Show resolved Hide resolved

blychs added 2 commits February 5, 2025 11:42

Fix bug using p.obj instead of p_region

ae78b1b

Rename auto-region:box as custom:box

36ad242

rschwant reviewed Feb 6, 2025

View reviewed changes

docs/users_guide/region_selection.rst Outdated Show resolved Hide resolved

docs/users_guide/region_selection.rst Outdated Show resolved Hide resolved

melodies_monet/util/region_select.py Outdated Show resolved Hide resolved

blychs and others added 3 commits February 6, 2025 11:18

Update docs/users_guide/region_selection.rst

df584e9

Co-authored-by: Becky Schwantes <[email protected]>

Update docs/users_guide/region_selection.rst

e9fb351

Co-authored-by: Becky Schwantes <[email protected]>

Fix typo en custom:box

215f5e5

rschwant approved these changes Feb 6, 2025

View reviewed changes

zmoon reviewed Feb 10, 2025

View reviewed changes

melodies_monet/util/region_select.py Outdated Show resolved Hide resolved

zmoon reviewed Feb 10, 2025

View reviewed changes

melodies_monet/util/region_select.py Outdated Show resolved Hide resolved

zmoon reviewed Feb 10, 2025

View reviewed changes

melodies_monet/util/region_select.py Show resolved Hide resolved

blychs and others added 2 commits February 10, 2025 09:42

Update melodies_monet/util/region_select.py

0327987

correct spelling and typos Co-authored-by: Zachary Moon <[email protected]>

Update melodies_monet/util/region_select.py

863bea7

Add raise_for_status() Co-authored-by: Zachary Moon <[email protected]>

zmoon reviewed Feb 10, 2025

View reviewed changes

melodies_monet/util/region_select.py Outdated Show resolved Hide resolved

zmoon reviewed Feb 10, 2025

View reviewed changes

melodies_monet/util/region_select.py Outdated Show resolved Hide resolved

zmoon reviewed Feb 13, 2025

View reviewed changes

melodies_monet/util/region_select.py Outdated Show resolved Hide resolved

zmoon reviewed Feb 13, 2025

View reviewed changes

melodies_monet/util/region_select.py Outdated Show resolved Hide resolved

blychs and others added 4 commits February 13, 2025 10:40

Turn methods private

8900536

Update melodies_monet/util/region_select.py

eeb7f37

Make the content_disposition more robust if missing Co-authored-by: Zachary Moon <[email protected]>

Update melodies_monet/util/region_select.py

8c9a90a

boolean type hint to bool Co-authored-by: Zachary Moon <[email protected]>

Update melodies_monet/util/region_select.py

c8d69bd

Add space in docs Co-authored-by: Zachary Moon <[email protected]>

Change location of error message and make function public

2062485

Update license header

8a2c91c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/region support #326

Feature/region support #326

blychs commented Dec 31, 2024 •

edited

Loading

blychs commented Dec 31, 2024

blychs commented Feb 4, 2025

rschwant left a comment

rschwant left a comment

rschwant left a comment

zmoon commented Feb 13, 2025 •

edited

Loading

blychs commented Feb 13, 2025 •

edited

Loading

zmoon commented Feb 13, 2025

blychs commented Feb 13, 2025

DWesl commented Feb 18, 2025

Feature/region support #326

Are you sure you want to change the base?

Feature/region support #326

Conversation

blychs commented Dec 31, 2024 • edited Loading

blychs commented Dec 31, 2024

blychs commented Feb 4, 2025

rschwant left a comment

Choose a reason for hiding this comment

rschwant left a comment

Choose a reason for hiding this comment

rschwant left a comment

Choose a reason for hiding this comment

zmoon commented Feb 13, 2025 • edited Loading

blychs commented Feb 13, 2025 • edited Loading

zmoon commented Feb 13, 2025

blychs commented Feb 13, 2025

DWesl commented Feb 18, 2025

blychs commented Dec 31, 2024 •

edited

Loading

zmoon commented Feb 13, 2025 •

edited

Loading

blychs commented Feb 13, 2025 •

edited

Loading