-
Notifications
You must be signed in to change notification settings - Fork 5
Home
You are reading this because you have some familiarity with SDMX-ML and the RDF Data Cube vocabulary. Some knowledge of Linked Data practices, XML, XSLT would be handy as well.
Given Generic SDMX-ML data or metadata as input, XSL 2.0 templates transforms them to RDF/XML. It uses vocabularies like QB, SDMX, SKOS, XKOS, PROV-O.
The transformation follows some common Linked Data practices as well as other ones out of thin air :) If you disagree or would like to propose alternatives, please either contact me or better yet, create an issue. Relevant changes will then be reflected here.
Here is an outline for some of the noteworthy things in the templates.
Base URIs can be set for classes, codelists, concept schemes, datasets, slices, properties, provenance, as well as for the source SDMX data.
The value for uriThingSeparator
e.g., /
, lets one set the delimiter to separate the "thing" from the rest of the URI. In the Linked Data community, this is typically either a /
or #
. For example, if slash is used, an URI would end up like http://example.org/code/EUROSTAT/CL_GEO (note the last slash before CL_GEO). If hash is used, an URI would end up like http://example.org/code/EUROSTAT#CL_GEO.
Similarly, uriDimensionSeparator
can be set to separate dimension values that's used in RDF Data Cube observation URIs. As observation should have its own unique URI, the method to construct URIs is done by taking dimension values as safe terms to be used in URIs separated by the value in uriDimensionSeparator
. For example, here is a crazy looking observation URI where uriDimensionSeparator
is set to /
: http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01/5938/1/15497/4/21/1/2011/2011-12-31
. But with uriThingSeparator
set to #
and uriDimensionSeparator
set to -
, it could end up like http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01#5938-1-15497-4-21-1-2011-2011-12-31
. If you are wondering about DSD_T_PERSON_STATTAB-01-2A01
, that's the KeyFamily (DSD) id, and REF_DEMO
is the agency id which is picked up automatically from source data, and http://example.org/dataset/
would be the value that can be set in config for the base URI for dataset.
Creator's URI can also be set which is also used for provenance data.
Possibility to force a default xml:lang to be used for skos:prefLabel and skos:definition when lang is not originally in data. If config.rdf contains a non-empty lang value it will use it.
SDMX Annotations contain important information that can be put to use by the publisher. Data in AnnotationTypes for instance is standardized by the publisher's needs, but there is no standardization on how they are used across all SDMX publishers. Therefore, in order not to leave this information behind in the final transformation, the configuration allows the publisher to simply define the way they will be put to use. This done by setting interlinkAnnotationTypes
: the AnnotationType to detect (in rdfs:label
), the predicate (as an XML QName) to use (in rdf:predicate
), and the instances of Concepts or Codes to apply to (in rdf:type
). Currently this feature is only applied to Annotations in Concepts and Codes. For example, given the following SDMX snippet:
<structure:CodeList id="CL_HGDE_GDE" agencyID="CH1_RN">
<structure:Code value="13256">
<structure:Description>Aeugst am Albis</structure:Description>
<structure:Annotations>
<common:Annotation>
<common:AnnotationType>CODE_OFS</common:AnnotationType>
<common:AnnotationText>1</common:AnnotationText>
</common:Annotation>
</structure:Code>
</structure:CodeList>
and the following configuration in config.rdf:
<rdf:value>
<rdf:Description>
<rdf:value>
<rdf:Description>
<rdf:predicate>xkos:hasPart</rdf:predicate>
<rdf:type>http://example.org/code/CH1_RN/CL_HGDE_GDE</rdf:type>
<rdfs:label>CODE_OFS</rdfs:label>
</rdf:Description>
</rdf:value>
<rdfs:label>interlinkAnnotationTypes</rdfs:label>
</rdf:Description>
</rdf:value>
would result in the final RDF/XML transformation:
<rdf:Description rdf:about="http://data.bfs.admin.ch/code/CH1_RN/CL_HGDE_GDE/13256">
<xkos:hasPart rdf:resource="http://data.bfs.admin.ch/code/CH1_RN/CL_HGDE_GDE/1">
</rdf:Description>
There is preliminary provenance level data:
Resources of type qb:DatasetStructureDefinition
s and qb:DataSet
s are also typed with prov:Entity
, and given prov:wasAttributedTo
with the value from creator
(which is typed with prov:Agent
) in config.ttl. There is a unique prov:Activity
for each transformation (typically one for the data and another for the DSD), and it contains values for prov:startedAtTime
, prov:used
(which files that was used) to what was prov:generated
(and source data URI that it prov:wasDerivedFrom
).
- Edit
scripts/config.rdf
to configure things like base URIs, delimiters to use in URIs, or even how to put SDMX AnnotationTypes into good use. - Either use the provided
scripts/generic.sh
to transform generic SDMX-ML in samples/ to RDF/XML, or use it on your own data with an XSLT 2.0 processor, with a command something along the lines of (using saxonb-xslt for Ubuntu for example here):
The following takes the metadata from BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml
using the scripts/generic.xsl
template to create the corresponding RDF/XML in BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.rdf
. The parameter xmlDocument
value is used in the final transformation to let the processor know the file that was being transformed (also used for provenance data) - just reuse the same value as the input XML value in -s:
saxonb-xslt -s ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml -xsl generic.xsl xmlDocument=../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml > ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.rdf
Similar to above, the following generates the RDF/XML BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.rdf
from BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.xml
by making use of the knowledge from the metadata in BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml
with parameter pathToGenericStructure
:
saxonb-xslt -t -tree:linked -s ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.xml -xsl generic.xsl xmlDocument=../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.xml pathToGenericStructure=../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml > ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.rdf
-tree:linked
helps for large files.
Based on sample data..