Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/region support #326

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
ef2428b
region select for custom regions
blychs Nov 21, 2024
da33919
Regionmask tool
blychs Nov 22, 2024
7c53f29
Delete old unused code
blychs Nov 22, 2024
0432ace
Return masked_dataset directly
blychs Nov 23, 2024
85f9122
test region support
blychs Dec 3, 2024
629c700
Add explicit parameter domain_info
blychs Dec 4, 2024
9a3a07c
Use domain_info for extra data
blychs Dec 4, 2024
99acbb3
deal with pandas, stil in testing
blychs Dec 5, 2024
330bc9d
Control for region selection
blychs Dec 7, 2024
3733a17
Start adding test for regionsupport
blychs Dec 16, 2024
72f7b9c
Change download scheme avoiding pooch.
blychs Dec 27, 2024
725f967
Eliminate drop=True in auto-region
blychs Dec 30, 2024
895c468
Fix typo in import region_select for stats
blychs Dec 31, 2024
eb2f820
Fix bug in query-like search and typo in auto-region:custom_box
blychs Dec 31, 2024
15a680a
Update the docs to new region support
blychs Dec 31, 2024
76d86df
remove changes from surfplots.py
blychs Dec 31, 2024
f467367
Remove test_regionmask.py, which was incomplete.
blychs Dec 31, 2024
0a7bd17
Add region_selection.rst to docs
blychs Dec 31, 2024
3de40f2
Add Copyright notice
blychs Dec 31, 2024
c99929c
Change names of region types
blychs Jan 13, 2025
ae78b1b
Fix bug using p.obj instead of p_region
blychs Feb 5, 2025
36ad242
Rename auto-region:box as custom:box
blychs Feb 5, 2025
df584e9
Update docs/users_guide/region_selection.rst
blychs Feb 6, 2025
e9fb351
Update docs/users_guide/region_selection.rst
blychs Feb 6, 2025
215f5e5
Fix typo en custom:box
blychs Feb 6, 2025
0327987
Update melodies_monet/util/region_select.py
blychs Feb 10, 2025
863bea7
Update melodies_monet/util/region_select.py
blychs Feb 10, 2025
8900536
Turn methods private
blychs Feb 13, 2025
eeb7f37
Update melodies_monet/util/region_select.py
blychs Feb 13, 2025
8c9a90a
Update melodies_monet/util/region_select.py
blychs Feb 13, 2025
c8d69bd
Update melodies_monet/util/region_select.py
blychs Feb 13, 2025
2062485
Change location of error message and make function public
blychs Feb 13, 2025
8a2c91c
Update license header
blychs Feb 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions docs/appendix/yaml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,22 @@ For automatic EPA or Giorgi region boxes (if they are not included
with the columns in the observation file), choose ``auto-region:epa`` or
``auto-region:giorgi``. Take into account that ``auto-region:epa`` is only a rough
approximation, since it assumes perfect, rectangular lonlat boxes.
If you only need a rectangular, lonlat box which does not cross the antimeridian, you can use
``auto-region:custom_box``, which needs to be combined with the ``domain_info`` parameter and
blychs marked this conversation as resolved.
Show resolved Hide resolved
a box of ``bounds: [minlat, minlon, maxlat, maxlon]``. See :doc:`/users_guide/region_selection` for examples.

If you have ``regionmask`` installed, you can also use it for advanced region support.
These regions can be arbitrary, and its use require providing ``domain_type`` parameters starting
with ``custom:``.
There are three ways to use ``regionmask``. ``custom:auto_polygon`` lets the user define their own
polygon in the section ``domain_info``, using the keyword ``mask_info``.
``custom:defined_region`` lets the user utilize any region predefined by
`regionmask <https://regionmask.readthedocs.io/en/stable/>`__, defined in ``domain_info`` using
the keywords ``name_regiontype`` and ``region``.
The third option is using the keyword `custom:custom_file`, which is defined in ``domain_info`` with
blychs marked this conversation as resolved.
Show resolved Hide resolved
either ``mask_path:path_shapefile_or_geojson`` or ``mask_url:url_of_shapefile_or_geojson``,
``abbrevs``, ``name`` and ``region_name``. See :doc:`/users_guide/region_selection` for examples and a more
detailed explanation.

**domain_name:** List of domain names to be plotted. If domain_type = all, all
data will be used and the domain_name is used only in the plot title. If
Expand Down
1 change: 1 addition & 0 deletions docs/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Optional dependencies
- ``typer`` (to use the :doc:`/cli`;
add ``rich`` `for <https://typer.tiangolo.com/release-notes/#060>`__ fancy tracebacks and ``--help``)
- ``pooch`` (to enable automatic downloading of :doc:`tutorial datasets </examples/tutorial-data>`)
- ``regionmask`` (`for complex region masking support <https://regionmask.readthedocs.io/en/stable/>`__; can read shapefiles, geojson, arbitrary polygons and predefined regions.)

Incompatibilities
-----------------
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ MONETIO please refer to:
users_guide/supported_stats
users_guide/time_chunking
users_guide/gridded_datasets
users_guide/region_selection

.. toctree::
:hidden:
Expand Down
53 changes: 53 additions & 0 deletions docs/users_guide/region_selection.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Selecting regions
=================

MELODIES-MONET has different ways to mask/select a specific region.

The easiest, and often most precise, is to have that region defined in the observations/paired data.
In this case, the variable/column that defines the region of interest should be selected as
`domain_type` in the YAML file, and the name of the region to select should be provided as `domain_name`.

If that data are not available in the observations, another option is to provide a lonlat box, which is
defined by setting `domain_type: auto-region:xxxxx`, where `xxxxx` can be `epa`, `giorgi` or `custom_box`.
`Giorgi` regions and a rough, rectangular approximation to `EPA` regions have already been hardcoded into
MELODIES-MONET.
In the case of `EPA` regions, be aware that the approximation is quite rough to force it into a rectangular lonlat box, and although it is probably sufficient for plotting maps, it can lead to errors if used for anything else.
If `auto-region:custom_box` is selected a lonlat box in the form of `bounds: [minlat, minlon, maxlat, maxlon]` needs to be provided in `domain_info` (see example below).
`auto-region:custom_box` has, however, some limitations: `minlon` and `maxlon` need to be in the range of `[-180, 180]`, and the box cannot cross the antimeridian.

A third, and more sofisticated option, consists in utilizing the optional dependency `regionmask <https://regionmask.readthedocs.io/en/stable/>`__.
This is selected by defining `domain_type: custom:xxxxx`, where `xxxxx` can be `auto_polygon`, `defined_region` or `custom_file`.
All of these options require extra data provided in a `domain_info` keyword in the YAML file.
This option includes a multiplicity of capabilities:
* If `auto_polygon` is selected, the vertices of one or more arbitrary polygons need to be provided (anti clockwise).
Currently no holes inside the polygon are supported.
* If `defined_region` is selected, any defined region supported by `regionmask` can be provided in `domain_info`.
* If `custom_file` is provided, the path to a shapefile/geojson file has to be provided. There is no need to decompress `.zip` shapefiles. Alternatively, the download URL can be provided, and the code will download the file automatically. Be aware that if a file with the same name is already in the working directory, it will be silently overwritten.


An example of the plotting part of an arbitrary plot for eact type of region is shown below:

.. code-block:: yaml

domain_type: ["all", "all", "state_name", "epa_region", "auto-region:giorgi", "auto-region:custom_box", "custom:auto_polygon", "custom:auto_polygon", "custom:defined_region", "custom:custom_file", "custom:custom_file"]
domain_name: ["CONUS", "model", "CO", "R8", "CNA", "R8box", "onepoly", "twopolys", "colorado", "denverfile", "denverurl"]
domain_info:
R8box:
bounds: [39.8, -105.30, 40.2, -105.1]
onepoly:
mask_info: [[-104.968, 39.47], [-104.618, 39.75], [-104.968, 40.06], [-105.32, 39.75]]
twopolys:
mask_info: [[[-104.968, 39.47], [-104.618, 39.75], [-104.968, 40.06], [-105.32, 39.75]], [[-107.474, 37.693], [-108.037, 37.659], [-108.423, 36.97], [-106.444, 36.97], [-106.497, 37.473], [-107.4597, 37.4693]]]
colorado:
name_regiontype: natural_earth_v5_1_2.us_states_10
region: CO
denverfile:
mask_path: "Colorado_County_Boundaries.zip"
abbrevs: COUNTY
name: Counties
region_name: DENVER
denverurl:
mask_url: "https://hub.arcgis.com/api/v3/datasets/66c2642209684b90af84afcc559a5a02_5/downloads/data?format=shp&spatialRefId=4269&where=1%3D1"
abbrevs: "COUNTY"
name: "Counties"
region_name: "DENVER"
70 changes: 31 additions & 39 deletions melodies_monet/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -1444,7 +1444,8 @@ def plotting(self):
None
"""

from .util.tools import resample_stratify, get_epa_region_bounds, get_giorgi_region_bounds
from .util.tools import resample_stratify
from .util.region_select import select_region
import matplotlib.pyplot as plt
pair_keys = list(self.paired.keys())
if self.paired[pair_keys[0]].type.lower() in ['sat_grid_clm','sat_swath_clm']:
Expand Down Expand Up @@ -1512,9 +1513,11 @@ def plotting(self):
# Loop also over the domain types. So can easily create several overview and zoomed in plots.
domain_types = grp_dict['domain_type']
domain_names = grp_dict['domain_name']
domain_infos = grp_dict.get('domain_info', {})
for domain in range(len(domain_types)):
domain_type = domain_types[domain]
domain_name = domain_names[domain]
domain_info = domain_infos.get(domain_name, None)

# Then loop through each of the pairs to add to the plot.
for p_index, p_label in enumerate(pair_labels):
Expand All @@ -1533,9 +1536,15 @@ def plotting(self):
modvar = modvar + 'trpcol'

# for pt_sfc data, convert to pandas dataframe, format, and trim
if obs_type in ["sat_swath_sfc", "sat_swath_clm",
"sat_grid_sfc", "sat_grid_clm",
"sat_swath_prof"]:
# Query selected points if applicable
if domain_type != 'all':
p_region = select_region(p.obj, domain_type, domain_name, domain_info)
else:
p_region = p.obj


if obs_type in ["sat_swath_sfc", "sat_swath_clm", "sat_grid_sfc",
"sat_grid_clm", "sat_swath_prof"]:
# convert index to time; setup for sat_swath_clm

if 'time' not in p.obj.dims and obs_type == 'sat_swath_clm':
blychs marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -1547,12 +1556,12 @@ def plotting(self):
# pairdf_all = p.obj.stack(ll=['x','y'])
# pairdf_all = pairdf_all.rename_dims({'ll':'y'})
else:
pairdf_all = p.obj
pairdf_all = p_region
# Select only the analysis time window.
pairdf_all = pairdf_all.sel(time=slice(self.start_time,self.end_time))
else:
# convert to dataframe
pairdf_all = p.obj.to_dataframe(dim_order=["time", "x"])
pairdf_all = p_region.to_dataframe(dim_order=["time", "x"])
# Select only the analysis time window.
pairdf_all = pairdf_all.loc[self.start_time : self.end_time]

Expand Down Expand Up @@ -1616,30 +1625,6 @@ def plotting(self):
# Determine outname
outname = "{}.{}.{}.{}.{}.{}.{}".format(grp, plot_type, obsvar, startdatename, enddatename, domain_type, domain_name)

# Query selected points if applicable
if domain_type != 'all':
if domain_type.startswith("auto-region"):
_, auto_region_id = domain_type.split(":")
if auto_region_id == 'epa':
bounds = get_epa_region_bounds(acronym=domain_name)
elif auto_region_id == 'giorgi':
bounds = get_giorgi_region_bounds(acronym=domain_name)
else:
raise ValueError(
"Currently, region selections whithout a domain query have only "
"been implemented for Giorgi and EPA regions. You asked for "
f"{domain_type!r}. Soon, arbitrary rectangular boxes, US states and "
"others will be included."
)
pairdf_all = pairdf_all.loc[
(pairdf_all["latitude"] > bounds[0])
& (pairdf_all["longitude"] > bounds[1])
& (pairdf_all["latitude"] < bounds[2])
& (pairdf_all["longitude"] < bounds[3])
]
else:
pairdf_all.query(domain_type + ' == ' + '"' + domain_name + '"', inplace=True)

# Query with filter options
if 'filter_dict' in grp_dict['data_proc'] and 'filter_string' in grp_dict['data_proc']:
raise Exception("""For plot group: {}, only one of filter_dict and filter_string can be specified.""".format(grp))
Expand Down Expand Up @@ -2562,6 +2547,8 @@ def plotting(self):
vmodel = self.models[p.model].obj.loc[dict(time=slice(self.start_time, self.end_time))]
except KeyError as e:
raise Exception("MONET requires an altitude dimension named 'z'") from e
if grp_dict.get('data_proc', {}).get('crop_model', False) and domain_name != all:
vmodel = select_region(vmodel, domain_type, domain_name, domain_info)

# Determine proj to use for spatial plots
proj = splots.map_projection(self.models[p.model])
Expand Down Expand Up @@ -2613,6 +2600,7 @@ def stats(self):
"""
from .stats import proc_stats as proc_stats
from .plots import surfplots as splots
from .util.region_select import select_region

# first get the stats dictionary from the yaml file
stat_dict = self.control_dict['stats']
Expand Down Expand Up @@ -2652,9 +2640,11 @@ def stats(self):
# Loop also over the domain types.
domain_types = stat_dict['domain_type']
domain_names = stat_dict['domain_name']
domain_infos = stat_dict.get('domain_info', {})
for domain in range(len(domain_types)):
domain_type = domain_types[domain]
domain_name = domain_names[domain]
domain_info = domain_infos.get(domain_name, None)

# The tables and text files will be output at this step in loop.
# Create an empty pandas dataarray.
Expand Down Expand Up @@ -2707,22 +2697,24 @@ def stats(self):
if obsvar == 'nitrogendioxide_tropospheric_column':
modvar = modvar + 'trpcol'

# Query selected points if applicable
if domain_type != 'all':
p_region = select_region(p.obj, domain_type, domain_name, domain_info)
else:
p_region = p.obj

# convert to dataframe
# handle different dimensios, M.Li
if ('y' in p.obj.dims) and ('x' in p.obj.dims):
pairdf_all = p.obj.to_dataframe(dim_order=["x", "y"])
elif ('y' in p.obj.dims) and ('time' in p.obj.dims):
pairdf_all = p.obj.to_dataframe(dim_order=["time", "y"])
if ('y' in p_region.dims) and ('x' in p_region.dims):
pairdf_all = p_region.to_dataframe(dim_order=["x", "y"])
elif ('y' in p_region.dims) and ('time' in p_region.dims):
pairdf_all = p_region.to_dataframe(dim_order=["time", "y"])
else:
pairdf_all = p.obj.to_dataframe(dim_order=["time", "x"])
pairdf_all = p_region.to_dataframe(dim_order=["time", "x"])

# Select only the analysis time window.
pairdf_all = pairdf_all.loc[self.start_time : self.end_time]

# Query selected points if applicable
if domain_type != 'all':
pairdf_all.query(domain_type + ' == ' + '"' + domain_name + '"', inplace=True)

# Query with filter options
if 'data_proc' in stat_dict:
if 'filter_dict' in stat_dict['data_proc'] and 'filter_string' in stat_dict['data_proc']:
Expand Down
Loading
Loading