-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom shapes #20
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general approach.
I have mainly one consequential comment about your implementation. Have you considered keeping the new function in the Snakefiles? I would not have it live in the lib, but rather in the Snakefiles. Here's why:
- It is strongly linked to the workflow.
- Having it in the lib requires you to install the lib in the base env.
- It's really small and does not depend on anything else in the lib.
The only downside would be that it's more difficult (or impossible?) to test. But I think the pros outweigh this con.
type: string | ||
enum: | ||
- gadm0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pity. Not really bad though.
environment.yaml
Outdated
@@ -6,3 +6,6 @@ dependencies: | |||
- python=3.6 | |||
- snakemake-minimal=5.4 | |||
- pycountry=20.7.3 | |||
- pip | |||
- pip: | |||
- -e ./lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lib requires numpy
and so you need list numpy here otherwise it's going to be pip-installed. Also, pinning the version number would be good.
@@ -6,7 +6,7 @@ | |||
import pandas as pd | |||
from pandas.testing import assert_frame_equal | |||
|
|||
from renewablepotentialslib.conversion import ( | |||
from renewablepotentialslib.geo.conversion import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to rename the parent folder of this test to geo
-- the structure above lib/tests/
should resemble the structure within potentialslib
in my opinion.
@@ -0,0 +1,10 @@ | |||
def collect_shape_dirs(config, rules): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being in the public API of the lib
, this function should have a proper docstring.
@@ -0,0 +1,10 @@ | |||
def collect_shape_dirs(config, rules): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may not like this comment: I don't think this function should be in the lib. This is for three reasons:
- It is strongly linked to the workflow. Not only does it use the Snakemake workflow object
rules
, but also it expects certain rules to exist in the workflow. - Having it in the lib requires you do install the lib in the base env which I think we should keep as clean as possible because it's not controlled by Snakemake itself. Most of the changes in this PR are only necessary because of that.
- It's really small and does not depend on anything else in the lib.
For these reasons, I believe the function should live in data-preprocessing.smk
rather than the lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's where I first put it, for all the reasons you give above. However, I can't test it when it sits in the middle of the Snakefile, I assume... Perhaps a 'rule_utils.py' file in the rules
dir?
environment.yaml
Outdated
@@ -6,3 +6,6 @@ dependencies: | |||
- python=3.6 | |||
- snakemake-minimal=5.4 | |||
- pycountry=20.7.3 | |||
- pip | |||
- pip: | |||
- -e ./lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in favour of installing the lib into the base env, as this env should be as lean as possible.
scripts/administrative_borders.py
Outdated
@@ -47,11 +45,10 @@ def normalise_admin_borders(path_to_nuts, path_to_gadm, path_to_lau, crs, scope_ | |||
|
|||
|
|||
if __name__ == "__main__": | |||
shape_dirs = {k: v for k, v in snakemake.input.items() if k != "src"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems brittle. Isn't it possible to give the whole thing a name?
So instead of {"nuts": something_with_nuts}
this whole thing would be {"shapes": {"nuts": something_with_nuts}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that snakemake can track inputs that are defined inside a nested dictionary...
lib/tests/test_rule_utils.py
Outdated
for key in ["nuts", "gadm"]: | ||
assert collected[key] == f"you_got_{key}" | ||
|
||
def test_will_fail(self, rules): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give this test a more self-explanatory name? I don't understand why this is failing. Is it because of the dashes?
fe2773e
to
f68b720
Compare
I have now moved this function out of the lib and it sits instead in with the snakemake rules in a specific util file. I have updated the testing mechanism to be able to cover this, at least within the snakemake test rule. This repo could probably restructure according to calliope-project/euro-calliope#120 in the longer run. |
This addresses an issue concerning the upstream dependency of euro-calliope on the output of this workflow. Namely, the introduction of custom shapes. I've gone for quite a simple approach:
my-custom-shapes
)data-sources
by using the same string name (my-custom-shapes: path_to_shapefile.geojson
)administrative_borders.gpkg
based on all data sources requested from config. This allows it to add in new layers for any custom shapes and to avoid building a layer entirely if not requested by a layer in config. For example, one could remove the need to considerlau
units by not having the municipal layer in their config.This has led to injecting a python script into the snakemake rules, which has then led me to restructure the
renewablepotentialslib
package to handle that in a more lightweight way.Two potentially brittle parts:
nuts2
->nuts
, but also it will think ofmy-custom-shapes999
as referring to data sourcemy-custom-shapes
). I think this is fine, as the process will fail if it removes trailing digits and then can't find a corresponding data source.my-custom-shapes
) or level number (e.g.2
formy-custom-shapes2
). The level part could be optional, to default to the source name if not in the geodataframe.