Update WDPA processing #12

brynpickering · 2021-03-26T14:04:48Z

Add WDPA-date config param
Add schema
Handle latest batch of WDPA file names (incl Public in string) and structures (three separate shapes that need merging)

brynpickering · 2021-03-26T14:07:26Z

At this point we probably have to accept that with the protected areas data, the process can never be reproducible (we can't store and re-distribute a static dataset, and they update their dataset every few months), so this PR is really just to provide a smoother method to keep up-to-date.

To be even more clever, we could scrape the XML of the online datastore: "http://d1gam3xoknrgr2.cloudfront.net/" to get the URL to the latest dataset. But this is a bit of a hassle...

brynpickering · 2021-03-26T14:18:46Z

code snippet on how XML scraping might work to give available date options:

import requests
import xmltodict
import re

search_string = "current/WDPA_([\w]+)_Public_shp.zip"
r = requests.get("http://d1gam3xoknrgr2.cloudfront.net/")
doc = xmltodict.parse(r.content)
contents = doc['ListBucketResult']['Contents']
date_options = set()
for content in contents:
    search_result = re.search(search_string, content["Key"])
    if search_result is not None:
         date_options.update(search_result.groups())

This would currently return {'Mar2021'}

timtroendle

Looks good, but needs a few more changes. Unfortunately, this dataset is really annoying. Thanks for the update.

rules/data-preprocessing.smk

config/schema.yaml

rules/data-preprocessing.smk

brynpickering · 2021-03-26T14:57:57Z

Sorry, Friday afternoon fever - lots of errors left over in this implementation that I should have spotted sooner (you know, buy doing some simple checks like running it locally...)!

I'll make sure it works locally and then let you know when it's ready to look at again :)

timtroendle · 2021-03-26T15:13:45Z

#11 anyone?

brynpickering · 2021-03-26T17:32:22Z

@timtroendle This is now fixed up and works locally.

The one thing that is a bit annoying is the error catching: merging the files produces thousands of warnings and occasional at the moment (which don't obviously affect information in columns of interest to us). These then stop snakemake in its tracks (it's in 'bash strict mode'). So, I switch off error catching with ogrmerge and rely on the fact that if there is a problem it will be in the form of the outputs not being successfully generated. Do you think this is the cleanest way of doing it?

timtroendle · 2021-03-26T17:49:43Z

Hmm I am not sure to understand. Are you saying that ogrmerge throws errors and that leads Snakemake to crash? That would not be good.

brynpickering · 2021-03-26T18:14:03Z

It does throw errors, but not ones that are easy to debug, the output looks like this:

Warning 1: Value 555510160 of field WDPAID of feature 115190 not successfully written. Possibly due to too larger number with respect to field width
Warning 1: Value 555510162 of field WDPAID of feature 115191 not successfully written. Possibly due to too larger number with respect to field width

998 more lines of much the same
Then:

More than 1000 errors or warnings have been reported. No more will be reported from now.

The workflow continues unhindered, however, so whatever issue ogrmerge is having doesn't seem to be a problem. However, I could imagine updating this to run in python, such that geopandas handles the merging. Then it would be easier to keep the relevant columns and to debug any issues?

timtroendle · 2021-03-29T07:11:28Z

However, I could imagine updating this to run in python, such that geopandas handles the merging.

I'd feel more comfortable that way. I do not really understand what's going on here. The merging only, without anything else, is a three-liner or so. Plus, it would allow us to move from shape files to geopackage.

brynpickering · 2021-04-08T07:41:55Z

@timtroendle I've updated this to process the shapefiles purely in python. I've pythonified a bunch of other steps too, which takes advantage of needing to open these shapefiles anyway and having the functionality to clean and limit the scope of the GeoDataFrame using methods in administrative_borders.py.

- Use snakemake 'scripts' to run python script - Rely on renewablepotentialslib for shape helper functions

brynpickering · 2021-05-06T18:11:40Z

@timtroendle is there anything holding this up, or shall I merge?

timtroendle · 2021-05-11T14:51:30Z

config/schema.yaml

@@ -8,6 +8,9 @@ properties:
                type: number
                enum: [2006, 2010, 2013, 2016, 2021]
                description: Indicates the reference NUTS year
+            wdpa-version:
+                type: string


This could check for the form MMMYYYY, but maybe that's just too restrictive.

yeah, who knows how they'll choose to do versioning in future...

timtroendle · 2021-05-11T14:54:22Z

lib/renewablepotentialslib/shape_utils.py

+    original_crs = points.crs
+    # convert points to circles
+    points_in_metres = points.to_crs("epsg:3035")
+    points_in_metres.geometry = [


Why not call points_in_metres.buffer(...)?

https://geopandas.org/docs/reference/api/geopandas.GeoSeries.buffer.html?highlight=buffer

timtroendle · 2021-05-11T14:56:26Z

lib/tests/test_shape_utils.py

-    update_features
+    update_features,
+    estimate_polygons_from_points,
+    _radius_meter


Is it necessary to test this private function?

I don't see why testing private functions is any less necessary than public ones. Maybe this one in particular doesn't need testing, but there's no harm in it, right?

timtroendle · 2021-05-11T14:58:47Z

lib/tests/test_shape_utils.py

+        }
+        point_gdf = gpd.GeoDataFrame(
+            # x = longitude, y = latitude
+            geometry=gpd.points_from_xy([i[1] for i in points.values()], [i[0] for i in points.values()]),


This is geometry=gpd.points_from_xy(zip(*points.values())),. no?

It isn't quite, because coordinates are usually communicated as [lat, lon], but [x, y] is equivalent to [lon, lat]. Anyway, I'll update to be less verbose

timtroendle · 2021-05-12T07:26:11Z

rules/data-preprocessing.smk

+    params:
+        version = config["parameters"]["wdpa-version"]
+    output: temp("build/raw-wdpa")
+    conda: "../envs/default.yaml"


This conda env is useless here.

timtroendle · 2021-05-12T07:54:45Z

rules/data-preprocessing.smk

@@ -295,15 +299,8 @@ rule protected_areas_in_europe:
        bounds = "{x_min},{y_min},{x_max},{y_max}".format(**config["scope"]["bounds"])


Bounds are unused now.

timtroendle · 2021-05-12T07:56:44Z

rules/data-preprocessing.smk

@@ -268,25 +286,11 @@ rule slope_in_europe:
        """


-rule protected_areas_points_to_circles:
-    message: "Estimate shape of protected areas available as points only."
+rule protected_areas_in_europe_rasterised:


The naming scheme within this repository was such that xx_in_europe indicated a rasterised dataset with same bounds and resolution. I acknowledge this is not a great naming scheme, but I'd stick to it or update it everywhere.

timtroendle · 2021-05-12T07:58:01Z

rules/data-preprocessing.smk

+        unzip {input} *.zip -d {output}
+        unzip -o {output}/WDPA_{params.version}_Public_shp_0.zip -d {output}/WDPA_to_merge_0
+        unzip -o {output}/WDPA_{params.version}_Public_shp_1.zip -d {output}/WDPA_to_merge_1
+        unzip -o {output}/WDPA_{params.version}_Public_shp_2.zip -d {output}/WDPA_to_merge_2


This approach seems brittle: what if the dataset will contain 4 zips in the future? Can we make this a loop instead?

timtroendle · 2021-05-12T07:59:07Z

lib/tests/test_shape_utils.py

+
+    @pytest.fixture
+    def estimated_polygons(self, points):
+        return estimate_polygons_from_points(points, "REP_AREA")


This belongs into the test.

https://docs.pytest.org/en/stable/fixture.html#what-fixtures-are

Hmm, I would say this is part of the arranging process. As given in the example of what fixtures are, the calls to the class being tested are all part of fixtures. This is the same process I'm undertaking here. I then only assert in the test.

I don't see it that way. estimate_polygons_from_points is the behaviour/functionality you are testing, so it belongs into act rather than arrange.

hmm, ok. I can remove it, but it makes sense as a fixture as it has three tests associated with the same function call. Better to do that once than thrice...

Ok, I checked the pytest doc a little longer and it seems we are both partly right according to their definitions:

In pytest, “fixtures” are functions you define that serve this purpose. But they don’t have to be limited to just the arrange steps. They can provide the act step, as well, and this can be a powerful technique for designing more complex tests, especially given how pytest’s fixture system works. But we’ll get into that further down.

See this example: https://docs.pytest.org/en/stable/fixture.html#fixtures-can-be-requested-more-than-once-per-test-return-values-are-cached

timtroendle · 2021-05-12T08:00:21Z

lib/tests/test_shape_utils.py

+            assert np.isclose(calculated_radius, radius)
+
+    def test_estimate_polygons_area(self, points, estimated_polygons):
+        assert estimated_polygons.crs == points.crs


Usually there's one assert per test. This assert here seems unnecessary: you are testing this in the test below!

Added wdpa-date config & new WDPA processing

b4073a5

brynpickering requested a review from timtroendle March 26, 2021 14:07

timtroendle requested changes Mar 26, 2021

View reviewed changes

brynpickering added 2 commits March 26, 2021 15:52

Fixes all over the place

6257420

Remove temporary WDPA files

2138170

brynpickering mentioned this pull request Mar 26, 2021

Add a minimal test of entire workflow #11

Open

brynpickering added 2 commits March 26, 2021 17:38

Update WDPA extraction and merging

c0772bd

Suppress all errors with ogrmerge

c1496ad

brynpickering added 2 commits April 7, 2021 17:45

Merge branch 'main' into update-wdpa-processing

773d15d

Process points and polys at the same time

e9137b0

brynpickering requested a review from timtroendle April 8, 2021 07:44

brynpickering added 4 commits April 8, 2021 09:45

Remove excessive whitespace & unnecessary imports

9589577

Merge branch 'main' into update-wdpa-processing

99461f3

Update WDPA processing to match current main

b8e77bd

- Use snakemake 'scripts' to run python script - Rely on renewablepotentialslib for shape helper functions

Add new util tests

a8f04a0

brynpickering added 2 commits May 9, 2021 16:33

Merge branch 'main' into update-wdpa-processing

67c84f6

Update schema

2b791b3

timtroendle requested changes May 12, 2021

View reviewed changes

Updates following review

585ec63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update WDPA processing #12

Update WDPA processing #12

brynpickering commented Mar 26, 2021

brynpickering commented Mar 26, 2021

brynpickering commented Mar 26, 2021

timtroendle left a comment

brynpickering commented Mar 26, 2021

timtroendle commented Mar 26, 2021

brynpickering commented Mar 26, 2021

timtroendle commented Mar 26, 2021

brynpickering commented Mar 26, 2021

timtroendle commented Mar 29, 2021

brynpickering commented Apr 8, 2021

brynpickering commented May 6, 2021

timtroendle May 11, 2021

brynpickering May 12, 2021

timtroendle May 11, 2021

timtroendle May 11, 2021

timtroendle May 11, 2021

brynpickering May 12, 2021

timtroendle May 11, 2021

brynpickering May 12, 2021

timtroendle May 12, 2021

timtroendle May 12, 2021

timtroendle May 12, 2021

timtroendle May 12, 2021

timtroendle May 12, 2021

brynpickering May 12, 2021

timtroendle May 12, 2021

brynpickering May 12, 2021

timtroendle May 12, 2021

timtroendle May 12, 2021

		@@ -295,15 +299,8 @@ rule protected_areas_in_europe:
		bounds = "{x_min},{y_min},{x_max},{y_max}".format(**config["scope"]["bounds"])

Update WDPA processing #12

Are you sure you want to change the base?

Update WDPA processing #12

Conversation

brynpickering commented Mar 26, 2021

brynpickering commented Mar 26, 2021

brynpickering commented Mar 26, 2021

timtroendle left a comment

Choose a reason for hiding this comment

brynpickering commented Mar 26, 2021

timtroendle commented Mar 26, 2021

brynpickering commented Mar 26, 2021

timtroendle commented Mar 26, 2021

brynpickering commented Mar 26, 2021

timtroendle commented Mar 29, 2021

brynpickering commented Apr 8, 2021

brynpickering commented May 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment