Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom shapes #20

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion config/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ scope:
x_max: -10 # in degrees east
y_min: 30 # in degrees north
y_max: 41 # in degrees north

layers:
continental:
Austria: nuts0
Expand Down
18 changes: 3 additions & 15 deletions config/schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -111,19 +111,7 @@ properties:
patternProperties:
^.*$:
type: object
properties:
^.*$: # ideally this would be 'oneof' the list of countries above (can this be done in a schema??)
propertyNames: {"$ref": "#/properties/scope/properties/countries/items"}
patternProperties:
^.*$:
type: string
enum:
- gadm0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pity. Not really bad though.

- gadm1
- gadm2
- gadm3
- gadm4
- gadm5
- lau2
- nuts0
- nuts1
- nuts2
- nuts3

3 changes: 3 additions & 0 deletions environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ dependencies:
- python=3.6
- snakemake-minimal=5.4
- pycountry=20.7.3
- pip
- pip:
- -e ./lib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lib requires numpy and so you need list numpy here otherwise it's going to be pip-installed. Also, pinning the version number would be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in favour of installing the lib into the base env, as this env should be as lean as possible.

2 changes: 1 addition & 1 deletion envs/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ dependencies:
- pycountry=20.7.3
- pip=19.1.1
- pip:
- -e ../../lib
- -e ../../lib[geo]
7 changes: 0 additions & 7 deletions lib/renewablepotentialslib/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,3 @@
"""Top-level package of renewablepotentialslib."""

__version__ = "0.1.0" # Additionally defined in setup.py.

# from https://epsg.io/3035
EPSG_3035 = "EPSG:3035"
EPSG_3035_PROJ4 = "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs "
# from https://epsg.io/4326
WGS84 = "EPSG:4326"
WGS84_PROJ4 = "+proj=longlat +datum=WGS84 +no_defs "
6 changes: 6 additions & 0 deletions lib/renewablepotentialslib/geo/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# from https://epsg.io/3035
EPSG_3035 = "EPSG:3035"
EPSG_3035_PROJ4 = "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs "
# from https://epsg.io/4326
WGS84 = "EPSG:4326"
WGS84_PROJ4 = "+proj=longlat +datum=WGS84 +no_defs "
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def _set_proj_lib():
import shapely.ops
import pyproj

from renewablepotentialslib import EPSG_3035_PROJ4
from renewablepotentialslib.geo import EPSG_3035_PROJ4


def watt_to_watthours(watt, duration):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
import pycountry
import pyproj

from renewablepotentialslib.conversion import transform_bounds, eu_country_code_to_iso3
from renewablepotentialslib import EPSG_3035, EPSG_3035_PROJ4, WGS84, WGS84_PROJ4
from renewablepotentialslib.geo.conversion import transform_bounds, eu_country_code_to_iso3
from renewablepotentialslib.geo import EPSG_3035, EPSG_3035_PROJ4, WGS84, WGS84_PROJ4


def determine_pixel_areas(crs, bounds, resolution):
Expand Down Expand Up @@ -168,7 +168,8 @@ def update_features(gdf, src):
def drop_countries(gdf, scope_config):
countries = [pycountry.countries.lookup(i).alpha_3 for i in scope_config["countries"]]
_not_in = set(gdf.country_code).difference(countries)
print(f"Removing {_not_in} as they are outside of study area.")
if _not_in:
print(f"Removing {_not_in} as they are outside of study area.")

return gdf[gdf.country_code.isin(countries)]

Expand Down
10 changes: 10 additions & 0 deletions lib/renewablepotentialslib/rule_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
def collect_shape_dirs(config, rules):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being in the public API of the lib, this function should have a proper docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not like this comment: I don't think this function should be in the lib. This is for three reasons:

  1. It is strongly linked to the workflow. Not only does it use the Snakemake workflow object rules, but also it expects certain rules to exist in the workflow.
  2. Having it in the lib requires you do install the lib in the base env which I think we should keep as clean as possible because it's not controlled by Snakemake itself. Most of the changes in this PR are only necessary because of that.
  3. It's really small and does not depend on anything else in the lib.

For these reasons, I believe the function should live in data-preprocessing.smk rather than the lib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's where I first put it, for all the reasons you give above. However, I can't test it when it sits in the middle of the Snakefile, I assume... Perhaps a 'rule_utils.py' file in the rules dir?

STRIP = "0123456789"
shapes = set(shape.rstrip(STRIP) for layer in config["layers"].values() for shape in layer.values())
inputs = {}
for shape in shapes:
if shape in ["nuts", "gadm", "lau"]:
inputs[shape] = getattr(rules, f"administrative_borders_{shape}").output[0]
else:
inputs[shape] = config["data-sources"][shape]
return inputs
19 changes: 11 additions & 8 deletions lib/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,17 @@
maintainer_email='[email protected]',
packages=find_packages(exclude=['tests*']),
include_package_data=True,
install_requires=[
"pycountry",
"geopandas",
"numpy",
"pandas",
"shapely",
"pyproj"
],
install_requires=["numpy"],
extras_require={
'geo': [
"pycountry",
"geopandas",
"numpy",
"pandas",
"shapely",
"pyproj"
],
},
classifiers=[
'Environment :: Console',
'Intended Audience :: Science/Research',
Expand Down
2 changes: 1 addition & 1 deletion lib/test-requirements.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ dependencies:
- pandas=0.23.2
- shapely=1.7.1
- pip:
-e .
- -e .[geo]
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import pandas as pd
from pandas.testing import assert_frame_equal

from renewablepotentialslib.conversion import (
from renewablepotentialslib.geo.conversion import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to rename the parent folder of this test to geo -- the structure above lib/tests/ should resemble the structure within potentialslib in my opinion.

watt_to_watthours,
eu_country_code_to_iso3,
coordinate_string_to_decimal,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import shapely.geometry
import pycountry

from renewablepotentialslib.shape_utils import (
from renewablepotentialslib.geo.shape_utils import (
study_area,
to_multi_polygon,
drop_countries,
Expand Down
72 changes: 72 additions & 0 deletions lib/tests/test_rule_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import pytest

from renewablepotentialslib.rule_utils import collect_shape_dirs


class TestCustomShapes:
@pytest.fixture
def rules(self):
class DotDict(dict):
"""dot.notation access to dictionary attributes"""
def __getattr__(*args):
val = dict.__getitem__(*args)
return DotDict(val) if type(val) is dict else val
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
return DotDict({
"administrative_borders_nuts": {"output": ["you_got_nuts"]},
"administrative_borders_gadm": {"output": ["you_got_gadm"]},
"administrative_borders_lau": {"output": ["you_got_lau"]}
})

@pytest.mark.parametrize("number", (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 100, 1000, 5555))
def test_strip(self, rules, number):
config = {"layers": {"national": {"foo": "nuts{}".format(number)}}}
collected = collect_shape_dirs(config, rules)
assert collected == {"nuts": "you_got_nuts"}

@pytest.mark.parametrize("number", (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 100, 1000, 5555))
def test_no_strip(self, rules, number):
config = {"layers": {"national": {"foo": "{0}bar{0}".format(number)}}, "data-sources": {f"{number}bar": "foobar"}}
collected = collect_shape_dirs(config, rules)
assert collected == {f"{number}bar": "foobar"}

def test_standard_shapes(self, rules):
config = {
"layers": {
"national": {"foo": "nuts2", "bar": "nuts3", "baz": "gadm0"},
"custom": {"foo": "nuts2", "bar": "lau2", "baz": "gadm0"}
},
"data-sources": {"my-custom-shape": "data/dir.geojson"}
}
collected = collect_shape_dirs(config, rules)
assert set(collected.keys()) == set(["lau", "nuts", "gadm"])
for key in ["lau", "nuts", "gadm"]:
assert collected[key] == f"you_got_{key}"

def test_new_shapes(self, rules):
config = {
"layers": {
"national": {"foo": "my-custom-shape1", "bar": "nuts3", "baz": "my-custom-shape2"},
"custom": {"foo": "your-custom-shape2", "bar": "my-custom-shape12", "baz": "gadm0"}
},
"data-sources": {"my-custom-shape": "mydata/dir.geojson", "your-custom-shape": "yourdata/dir.geojson"}
}
collected = collect_shape_dirs(config, rules)
assert set(collected.keys()) == set(["my-custom-shape", "your-custom-shape", "nuts", "gadm"])
for key in ["my", "your"]:
assert collected[f"{key}-custom-shape"] == f"{key}data/dir.geojson"
for key in ["nuts", "gadm"]:
assert collected[key] == f"you_got_{key}"

def test_will_fail(self, rules):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give this test a more self-explanatory name? I don't understand why this is failing. Is it because of the dashes?

config = {
"layers": {
"national": {"foo": "my-custom-shape", "bar": "nuts3"},
},
"data-sources": {"your-custom-shape": "yourdata/dir.geojson"}
}
with pytest.raises(KeyError) as excinfo:
collected = set(collect_shape_dirs(config, rules).keys())

assert "my-custom-shape" in str(excinfo.value)
9 changes: 4 additions & 5 deletions rules/data-preprocessing.smk
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""This is a Snakemake file defining rules to retrieve raw data from online sources."""
import pycountry
from renewablepotentialslib.rule_utils import collect_shape_dirs

RESOLUTION_STUDY = (1 / 3600) * 10 # 10 arcseconds
RESOLUTION_SLOPE = (1 / 3600) * 3 # 3 arcseconds
Expand Down Expand Up @@ -34,7 +35,7 @@ rule raw_gadm_administrative_borders:
shell: "unzip -o {input} -d build/raw-gadm"


rule all_gadm_administrative_borders:
rule administrative_borders_gadm:
message: "Merge gadm administrative borders of all countries."
input:
["build/raw-gadm/gadm36_{}.gpkg".format(country_code)
Expand All @@ -47,7 +48,7 @@ rule all_gadm_administrative_borders:
shell: "ogrmerge.py -o {output} -f gpkg -src_layer_field_content '{{LAYER_NAME}}' -t_srs {params.crs} -single {input}"


rule raw_nuts_units:
rule administrative_borders_nuts:
message: "Download NUTS units as GeoJSON."
output:
protected("data/automatic/raw-nuts{}-units.geojson".format(config["parameters"]["nuts-year"]))
Expand Down Expand Up @@ -93,9 +94,7 @@ rule administrative_borders:
message: "Normalise all administrative borders."
input:
src = script_dir + "administrative_borders.py",
nuts_geojson = rules.raw_nuts_units.output[0],
gadm_gpkg = rules.all_gadm_administrative_borders.output[0],
lau_gpkg = rules.administrative_borders_lau.output[0]
**collect_shape_dirs(config, rules)
params:
crs = config["crs"],
scope = config["scope"]
Expand Down
15 changes: 6 additions & 9 deletions scripts/administrative_borders.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import geopandas as gpd

from renewablepotentialslib.shape_utils import (
from renewablepotentialslib.geo.shape_utils import (
buffer_if_necessary,
to_multi_polygon,
drop_countries,
Expand All @@ -22,12 +22,10 @@
}


def normalise_admin_borders(path_to_nuts, path_to_gadm, path_to_lau, crs, scope_config, path_to_output):
def normalise_admin_borders(crs, scope_config, path_to_output, **shape_dirs):
"""Normalises raw administrative boundary data and places it in one, layered geodatapackage."""

for _src, _path in {
'nuts': path_to_nuts, 'gadm': path_to_gadm, 'lau': path_to_lau
}.items():
for _src, _path in shape_dirs.items():
gdf = gpd.read_file(_path)
gdf = gdf.to_crs(crs)
gdf.geometry = gdf.geometry.map(buffer_if_necessary).map(to_multi_polygon)
Expand All @@ -47,11 +45,10 @@ def normalise_admin_borders(path_to_nuts, path_to_gadm, path_to_lau, crs, scope_


if __name__ == "__main__":
shape_dirs = {k: v for k, v in snakemake.input.items() if k != "src"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems brittle. Isn't it possible to give the whole thing a name?

So instead of {"nuts": something_with_nuts} this whole thing would be {"shapes": {"nuts": something_with_nuts}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that snakemake can track inputs that are defined inside a nested dictionary...

normalise_admin_borders(
path_to_nuts=snakemake.input.nuts_geojson,
path_to_gadm=snakemake.input.gadm_gpkg,
path_to_lau=snakemake.input.lau_gpkg,
crs=snakemake.params.crs,
scope_config=snakemake.params.scope,
path_to_output=snakemake.output[0]
path_to_output=snakemake.output[0],
**shape_dirs
)
2 changes: 1 addition & 1 deletion scripts/built_up_area.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from rasterstats import zonal_stats
import pandas as pd

from renewablepotentialslib.shape_utils import determine_pixel_areas
from renewablepotentialslib.geo.shape_utils import determine_pixel_areas


def built_up_areas(path_to_built_up_share, path_to_units, path_to_result):
Expand Down
2 changes: 1 addition & 1 deletion scripts/capacityfactors/id_map.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
NO_DATA_VALUE = 64001
INDEX_EPSILON = 10e-3

from renewablepotentialslib import EPSG_3035, EPSG_3035_PROJ4, WGS84, WGS84_PROJ4
from renewablepotentialslib.geo import EPSG_3035, EPSG_3035_PROJ4, WGS84, WGS84_PROJ4


def id_map(path_to_timeseries, path_to_map, resolution_km):
Expand Down
4 changes: 2 additions & 2 deletions scripts/capacityfactors/ninja_input_pv.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
import pandas as pd
import geopandas as gpd

from renewablepotentialslib.shape_utils import point_raster_on_shapes
from renewablepotentialslib.conversion import area_to_capacity, orientation_to_azimuth
from renewablepotentialslib.geo.shape_utils import point_raster_on_shapes
from renewablepotentialslib.geo.conversion import area_to_capacity, orientation_to_azimuth


def pv_simulation_parameters(path_to_shapes_of_land_surface, path_to_roof_categories,
Expand Down
2 changes: 1 addition & 1 deletion scripts/capacityfactors/ninja_input_wind.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pandas as pd
import geopandas as gpd

from renewablepotentialslib.shape_utils import point_raster_on_shapes
from renewablepotentialslib.geo.shape_utils import point_raster_on_shapes


def wind(path_to_shapes_of_land_surface, path_to_shapes_of_water_surface, bounds, ninja,
Expand Down
2 changes: 1 addition & 1 deletion scripts/estimate_protected_shapes.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import geopandas as gpd
import pycountry

from renewablepotentialslib import EPSG_3035_PROJ4
from renewablepotentialslib.geo import EPSG_3035_PROJ4


def estimate_shapes(path_to_input, scope_config, path_to_output):
Expand Down
2 changes: 1 addition & 1 deletion scripts/lau.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import geopandas as gpd
import pandas as pd

from renewablepotentialslib.shape_utils import to_multi_polygon
from renewablepotentialslib.geo.shape_utils import to_multi_polygon

OUTPUT_DRIVER = "GeoJSON"
KOSOVO_MUNICIPALITIES = [f"RS{x:02d}" for x in range(1, 38)]
Expand Down
2 changes: 1 addition & 1 deletion scripts/shared_coast.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import geopandas as gpd
from shapely.prepared import prep

from renewablepotentialslib.shape_utils import buffer_if_necessary
from renewablepotentialslib.geo.shape_utils import buffer_if_necessary

DRIVER = "GeoJSON"

Expand Down
2 changes: 1 addition & 1 deletion scripts/swiss_building_footprints.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import geopandas as gpd

from renewablepotentialslib.eligibility import Eligibility
from renewablepotentialslib.conversion import area_in_squaremeters
from renewablepotentialslib.geo.conversion import area_in_squaremeters


def swiss_building_footprint(path_to_building_footprint, path_to_eligibility,
Expand Down
2 changes: 1 addition & 1 deletion scripts/technically_eligible_area.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import numpy as np
import rasterio

from renewablepotentialslib.shape_utils import determine_pixel_areas
from renewablepotentialslib.geo.shape_utils import determine_pixel_areas
from renewablepotentialslib.eligibility import Eligibility

DATATYPE = np.float32
Expand Down
2 changes: 1 addition & 1 deletion scripts/technically_eligible_electricity_yield.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import numpy as np
import rasterio

from renewablepotentialslib.conversion import watt_to_watthours
from renewablepotentialslib.geo.conversion import watt_to_watthours


def determine_electricity_yield(path_to_eligibility_categories, path_to_capacities_pv_prio,
Expand Down
1 change: 1 addition & 0 deletions tests/test_administrative_borders.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

PATH_TO_BORDERS = ROOT_DIR / "build" / "administrative-borders.gpkg"


@pytest.mark.skipif(not PATH_TO_BORDERS.exists(), reason="Consolidated administrative border shapefile not available.")
def test_administrative_border_layers():
layers = fiona.listlayers(str(PATH_TO_BORDERS))
Expand Down