-
Notifications
You must be signed in to change notification settings - Fork 5
Create e-reporting (IPR) with RIVM data #7
Comments
Create (harmonised) data for: See the website: http://www.eionet.europa.eu/aqportal/datamodel |
Ok, getting more grips on this after seeing the AM and EF XSDs: There is also an RIVM example: https://github.com/Geonovum/sospilot/blob/master/data/eionet/xsd/REP_D-NL_RIVM_20140805_B-002.xml The User Guide is also useful: |
The RIVM report for "(B) Information on zones and agglomerations (Article 6)" https://github.com/Geonovum/sospilot/blob/master/data/eionet/xsd/REP_D-NL_RIVM_20140805_B-002.xml is quite complete. With one small change it validates against the AQD AM schema: AirQualityReporting.xsd http://dd.eionet.europa.eu/schemas/id2011850eu-1.0/AirQualityReporting.xsd. The change only deals with a GML issue where deprecatedTypes.xsd is required in the schemaLocation:
Only the geometry (am:geometry element) refers to a Shapefile in EPSG:28992:
while Germany uses real GML geometries for the am:geometry field like:
The question is what we should do with this? Replace Shapefile xlink with inline GML geometry? |
And also REP_D-NL_RIVM_20140805_D-002.xml (http://cdr.eionet.europa.eu/nl/eu/aqd/d/envu9_j7q/ Dataflow D validates OK, refers to B for zones). |
In summary: the question is: what is there specifically to do (for us) in this issue? Several Dataflows like B and D have been uploaded to the Eionet AQ Portal on aug 5, 2014, but also E (Measurements 223MB). There is feedback, like for E: "NL has reported both hourly and daily values. This results in 'double' observation objects. It is not intended to report aggregated data so they should report in this case only hourly data." I don't know how these uploaded dataflow files were made (handmade/generated/ETL tools?) and who is now working on the feedbacks. How to proceed? I will ask Hans/RIVM. |
Ok learned more from existing dataflow reports, EEA workshops, IRCELINE approach and more:
All in all, I did a PoC using a proven technology, although probably never used in the domain of (INSPIRE) GML Application Schema's: Python Templating. Python web frameworks like Django have a long and proven history of using Templating for generating HTML, XML or any other document from input structures applied to a Template to render a Document. There are numerous examples of Templating Languages: Django, Mako, Jinja2, Genshi, Mustache, see https://wiki.python.org/moin/Templating. Most Templating Languages like Mustache are not even bound to Python. Stetl http://stetl.org is a Python ETL framework, based on Input, Filter (transform) and Output modules. XML transformation up to now was mostly done using an XSLTFilter, also for harmonizing INSPIRE data from local sources. However: XSLT has the disadvantage of being verbose, passive/recursive/matching-driven, complex and without hardly any variable/control/function structuring possibilities other than proprietary XSLT-processor bindings or built-in functions. So I start trying an approach using Python Templating: https://wiki.python.org/moin/Templating, first within Stetl with some simple examples: using the Python built-in String Templating: https://github.com/justb4/stetl/tree/master/examples/basics/9_string_templating. But after that I made a choice from the zillion Python Templating Languages with one that is the used most and has a very active development community: Jinja2 http://jinja.pocoo.org. Jinja2 is for example extensively used in the widely known Django framework and in the Open Data world within CKAN: http://docs.ckan.org/en/latest/theming/templates.html. Learning Jinja2, I had never used Python Templating, only Java JSP which is a bit similar, took just a couple of hours and browsing examples on the Web. The development of a Jinja2TemplatingFilter in Stetl was therefore quite trivial, about 30 lines of code, all TemplatingFilters are in: https://github.com/justb4/stetl/blob/master/stetl/filters/templatingfilter.py. Jinja2 allows standard Template control structuring like loops, i.e. for looping over Features, but also a concept of "globals", variables to be applied globally. This proved to be very convenient for common "Boilerplate" data like organisations, telephone numbers etc. A worked out example is at: Applying the Jinja2TemplatingFilter to RIVM AQ Reporting proved to be almost trivial. A PoC is done for mapping the WFS RIVM Stations FeatureType to a Dataflow D report. The example can be found at https://github.com/Geonovum/sospilot/tree/master/src/aq-report. This Dataflow D report was created in just a few hours and contains no custom code, just some Templates with about 50 lines of Jinja2 code! This could be enhanced further with macros etc. , but it shows a very promising approach for AQ Reporting and IMO even INSPIRE harmonization. Jinja2 is so common that almost all IDEs like IDEA and Eclipse will support its syntax. Plus it is blazingly fast. So far: Enter The Jinja2! I see many advantages in this approach:
|
This seems very promising and nice work. We could try this for the other dataflows with GML as well, so the whole process could be automated (instead of using the manual reporting tooling of EIONET as is done now). |
A short remark about these xlinks to the features in Dataflow D: is there an encoding rule for the values in xlink:href? Shouldn't the values match with gml:id values (in that GML document)? In that case, either the INSPIRE namespace should be in the gml:id or the xlink values should be changed. |
The xlinks are encoded according to the UserGuide, see link above or the aqportal. Yes normally I would expect a #gmlid and a gml id in the target element. Well, why use xlinks at all? This is a featurecollection...Also strange is that the ReportingHeader is a feature and not in the general sectio of the FC... Thijs Brentjens [email protected] wrote:
|
Thanks! I think this can work. Main problem is getting the source data from RIVM. Best is to build web services for this source data. Probably most can be effected with data in PostGIS and WFS + SOS. Both can deliver data as (geo)json like in my example. By using VIEWs we can join tables and make selections (or via table join service ;-))... Thijs Brentjens [email protected] wrote:
|
About the xlinks: this could be a flaw in the UserGuide then. It doesn't make sense. But these xlinks are not very useful here at first sight indeed. About getting the source data: using web services that provide the source data (also useful as a "simple" version of the data) would be a nice approach. Not sure about the table join service here, but who knows :). If the database and some extra views in it are sufficient, then that would be good as well. |
But rendering an INSPIRE id struct could be our first Jinja2 macro :-). |
For Dataflow B we need some more input of RIVM: what data sources to use to create the dataflow? |
RIVM offers a WFS with aqd_zones (http://acceptatie.inspire.rivm.nl/geoserver/wfs?request=GetFeature&typeName=inspire:aqd_zone&outputformat=JSON). This is in accept, we need to check if we can use this data, since it is not in RIVM's production WFS. |
Alternatively, we could use the shapefile as offered in the AQPortal: http://cdr.eionet.europa.eu/nl/eu/aqd/b/envu9_csq |
For dataflow B, the pollutants need to be mapped to the vocabulary of AQ. The definitions can be found at: http://dd.eionet.europa.eu/vocabulary/aq/pollutant/view http://dd.eionet.europa.eu/vocabulary/aq/pollutant/rdf and: |
At sensors.geonovum.nl we have already a WFS (and WMS) based on the above Shapefiles, see http://sensors.geonovum.nl/gs/wfs?request=GetFeature&typeName=sensors:zones&outputformat=JSON . We can come quite far, but we are still lacking zone data attributes, like pollutants for the Zones. Also the corresponding properties are different for both WFSs and each lacking in data. For example for Zone Heerlen/Kerkrade the RIVM WFS has these properties (e.g. population and area are null, zone_type should be 'agg' or 'nonagg' etc):
While the Geonovum "Sensors"WFS has (missing e.g. point of contact):
Both WFSs e.g. are missing the pollutants (while those may be tied to the Stations who are tied to zones). An overall database-schema within RIVM would help tremendously. Though the Aq-Portal provides CSVs from the XMLs, this would be just temporary, e.g. for Zone B 3 CSVs: Dataflow B: Pollutant and protection targets as CSV Dataflow B: Competent Authorities for AQ zones |
Okay, I'll have a lookt at these new zones. We need information from the RIVM on their data sources for the missing properties I think. But for a first version, I'll give it a try with what we have. |
Yes, a good approach. I think a challenge is to convert GeoJSON to GML Probably using Python OGR http://www.gdal.org/classOGRGeometry.html to
The filter expression for each zone object becomes something like
I will only do the Stetl-part within Stetl, you can go ahead with flow On 08-09-14 11:46, Thijs Brentjens wrote:
|
Okay, I leave the geom for what is it now. I'm using ogr2ogr's -sql option to join the information from the CSV to the GeoJSON file, that seems to work. Would that be easy / interesting to use with Stetl? Example command: ogr2ogr -sql "select * from OGRGeoJSON a left join 'zonesattr.csv'.zonesattr b on a.zone_code = b.zone_code" -f "GeoJSON" zones-joined.json zones.json Edit: OGR2OGR shortens the attribute names, like is done in Shapefile column names. Maybe this is not the best approach for joining the CSV file, but we could change this when it is clear how RIVM could / would deliver the data |
Note that for joining there are different codes used sometimes: e.g. in our WFS we have NL0201 for Midden, in the CSV file Zones_NL-001_upd.csv it is NL0200. |
Note that this also means that the joined data might not contain correct values. |
…utants from codelist values - PM2.5 added
Good to see your first version of the Zones to Dataflow B ETL! Pollutant codes: I checked in a sort of hack but we may get those
feature.properties.zone_pollutant.split(';') %} with globs defined as:
"http://dd.eionet.europa.eu/vocabulary/aq/pollutant/29", But possibly there is a better solution....I checked in a bit too much: On 08-09-14 17:13, Thijs Brentjens wrote:
kind regards / met vriendelijke groet, --Just Just van den Broecke [email protected] |
…ts and axis ordering, example in Dataflow-D ETL
Refined the GML macros, with the usual GML-mess: GML3 vs GML2 encoding and Axis Ordering. Next is to migrate most/all GML macro's to Jinja2 Filters in Stetl that use Python OGR. Also updated validate.sh to include dataflow-B. dataflow-D output now validates against AQD/INSPIRE/GML schemas. |
Tip: if you commit to GitHub and provide the issue number in the commit message, that message will appear here, for example:
|
…y adding srsName=EPSG:4326 to WFS req!
Thanks for the tip, I just forgot that with the previous commit. Regarding the pollutant codes : I'd think direct support for looking up the codes in the SKOS vocabularies would be elegant (I found some python libs for that), but I can generate the "codelist" to use that in the globs for now. |
SKOS-based data, where (URL) is this service? I was under the impression that the component-codes like 'NO2-H' were RIVM-specific. But eventually we should be able to generate reports from live services: WFS, SOS, REST, SPARQL, whatever. Now data can be applied to a Jinja2 template via either standard input data (JSON file) via a Jinja2 Context and or "globals" (also JSON file) via Jinja2 Environment. I think "globals" should be kept to a minimum. There are two limitations right now (in Stetl):
Within Jinja2 (and in Stetl) the input file is passed as a "Jinja2 Context", in Python a dict (hashmap), for example "features" as used in our templates is in fact a key from a dict. same for the globals ("globs" or whatever is named as top-key). Useful Stetl-extensions thus could be the following:
Another possibility, a bit-more involved is to develop "smart" Jinja2 Filters, that will actually invoke an external web service like a SPARQL end-point.... |
… Geometry i.s.o. via macro for Dataflow-D
Another update: the Jinja2 Filter to generate GML from GeoJSON geometry has been improved in latest GitHub Stetl and is used in Dataflow-D jinja2 template, as follows:
The output then becomes like:
|
… Geometry i.s.o. via macro for Dataflow-D - fixed output
Ok, in latest Stetl GH version it is possible with Jinja2 filter to configure:
it seems to make more sense to use the globals for "reference-data" or data to be expanded/joined while the input-data is the core data. But experience will tell.... See example (Example 3, bottom) at https://github.com/justb4/stetl/blob/master/examples/basics/10_jinja2_templating/etl.cfg Maybe problem is that some services don't return JSON but XML... |
…t, to map the codes of RIVM to the harmonised codes
Yesterday I have created an XSLT to extract the notations and URIs to use in the Jinja globs. It transforms the RDF from http://dd.eionet.europa.eu/vocabularies?expand=true&expanded=&folderId=1, e.g. for pollutants: http://dd.eionet.europa.eu/vocabulary/aq/pollutant/view and the RDF http://dd.eionet.europa.eu/vocabulary/aq/pollutant/rdf |
Mooi werk, eleganter met de parts-split en lookup van pollutant def en protection target def via Jijna2 template. De geometry zou nu ook via nieuwe Filter (laatste Stetl GH versie) moeten kunnen worden ingevuld, spannend, nog niet voor MultiPolygon geprobeerd, zal iets moeten worden als
Is het gelijk in INSPIRE ETRS89... |
An exact match for the code "Benzene" (as used in RIVM values) seems to be missing in the vocabulary. We need to discuss with RIVM what to do here. |
Great! Dataflow-B now with MultiSurface's. You can run ./validate.sh for schema validation. Apart from Benzene there is a validation issue with empty am:beginLifespanVersion. Looking at the existing examples I placed under https://github.com/Geonovum/sospilot/tree/master/data/eionet/aq-report, I see that the date of report-generation is used, e.g. for the 5 aug 14 Dataflow-B report:
Maybe there is a Jinja2 'current_date' template or we could add one, or via a macro. |
Is Benzene niet http://dd.eionet.europa.eu/vocabulary/aq/pollutant/20 (Benzene (air)? De pollutant code is welliswaar C6H6 maar dat is Benzeen (hexagon van 6 koolstof-atomen, met ieder 1 H-atoom). Heb ik toch nog wat aan mijn scheikunde studie :-). |
Correct, Benzene is C6H6. The thing is: how to map this automatically using the codes RIVM provides? I'd say let's create an exception for now and try to find out why RIVM uses their codes. |
…ginLifespanVersion and pollutants
On 15-09-14 11:18, Thijs Brentjens wrote:
|
De laatste XML RIVM AQ bestanden van RSpoor toegevoegd en naar CSV omgezet. Begin gemaakt met Dataflow-C AQD_AssessmentRegime ETL. Is te doen. Voornaamste 2 onduidelijkheden: #. de mapping van Pollutant naar een Eionet Codelist URI, bijv "BaP" moet worden http://dd.eionet.europa.eu/vocabulary/aq/pollutant/5029 ("BaP in PM10") maar er matchen meerdere URIs
|
No description provided.
The text was updated successfully, but these errors were encountered: