Skip to content
csarven edited this page Jan 6, 2013 · 27 revisions

Before you start

You are reading this because you have some familiarity with SDMX-ML and the RDF Data Cube vocabulary. Some knowledge of Linked Data practices, XML, XSLT would be handy as well.

What can it do?

Given Generic SDMX-ML data or metadata as input, XSL 2.0 templates transforms them to RDF/XML. It uses vocabularies like QB, SDMX, SKOS, XKOS, PROV-O.

The transformation follows some common Linked Data practices as well as other ones out of thin air :) If you disagree or would like to propose alternatives, please either contact me or better yet, create an issue. Relevant changes will then be reflected here.

Configuration

Here is an outline for some of the noteworthy things in the templates.

URI configurations

Base URIs can be set for classes, codelists, concept schemes, datasets, slices, properties, provenance, as well as for the source SDMX data.

The value for uriThingSeparator e.g., /, lets one set the delimiter to separate the "thing" from the rest of the URI. In the Linked Data community, this is typically either a / or #. For example, if slash is used, an URI would end up like http://example.org/code/EUROSTAT/CL_GEO (note the last slash before CL_GEO). If hash is used, an URI would end up like http://example.org/code/EUROSTAT#CL_GEO.

Similarly, uriDimensionSeparator can be set to separate dimension values that's used in RDF Data Cube observation URIs. As observation should have its own unique URI, the method to construct URIs is done by taking dimension values as safe terms to be used in URIs separated by the value in uriDimensionSeparator. For example, here is a crazy looking observation URI where uriDimensionSeparator is set to /: http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01/5938/1/15497/4/21/1/2011/2011-12-31. But with uriThingSeparator set to # and uriDimensionSeparator set to -, it could end up like http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01#5938-1-15497-4-21-1-2011-2011-12-31. If you are wondering about DSD_T_PERSON_STATTAB-01-2A01, that's the KeyFamily (DSD) id, and REF_DEMO is the agency id which is picked up automatically from source data, and http://example.org/dataset/ would be the value that can be set in config for the base URI for dataset.

Creator's URI can also be set which is also used for provenance data.

Default to language

Possibility to force a default xml:lang to be used for skos:prefLabel and skos:definition when lang is not originally in data. If config.rdf contains a non-empty lang value it will use it.

Interlinking SDMX Annotations

SDMX Annotations contain important information that can be put to use by the publisher. Data in AnnotationTypes for instance is standardized by the publisher's needs, but there is no standardization on how they are used across all SDMX publishers. Therefore, in order not to leave this information behind in the final transformation, the configuration allows the publisher to simply define the way they will be put to use. This done by setting interlinkAnnotationTypes: the AnnotationType to detect (in rdfs:label), the predicate (as an XML QName) to use (in rdf:predicate), and the instances of Concepts or Codes to apply to (in rdf:type). Currently this feature is only applied to Annotations in Concepts and Codes. For example, given the following SDMX snippet:

<structure:CodeList id="CL_HGDE_GDE" agencyID="CH1_RN">
  <structure:Code value="13256">
    <structure:Description>Aeugst am Albis</structure:Description>
    <structure:Annotations>
      <common:Annotation>
        <common:AnnotationType>CODE_OFS</common:AnnotationType>
        <common:AnnotationText>1</common:AnnotationText>
      </common:Annotation>
  </structure:Code>
</structure:CodeList>

and the following configuration in config.rdf:

<rdf:value>
  <rdf:Description>
    <rdf:value>
      <rdf:Description>
        <rdf:predicate>xkos:hasPart</rdf:predicate>
        <rdf:type>http://example.org/code/CH1_RN/CL_HGDE_GDE</rdf:type>
        <rdfs:label>CODE_OFS</rdfs:label>
      </rdf:Description>
    </rdf:value>
    <rdfs:label>interlinkAnnotationTypes</rdfs:label>
  </rdf:Description>
</rdf:value>

would result in the final RDF/XML transformation:

<rdf:Description rdf:about="http://data.bfs.admin.ch/code/CH1_RN/CL_HGDE_GDE/13256">
  <xkos:hasPart rdf:resource="http://data.bfs.admin.ch/code/CH1_RN/CL_HGDE_GDE/1">
</rdf:Description>

Provenance

There is preliminary provenance level data:

Resources of type qb:DatasetStructureDefinitions and qb:DataSets are also typed with prov:Entity, and given prov:wasAttributedTo with the value from creator (which is typed with prov:Agent) in config.ttl. There is a unique prov:Activity for each transformation (typically one for the data and another for the DSD), and it contains values for prov:startedAtTime, prov:used (which files that was used) to what was prov:generated (and source data URI that it prov:wasDerivedFrom).

How to run:

  1. Edit scripts/config.rdf to configure things like base URIs, delimiters to use in URIs, or even how to put SDMX AnnotationTypes into good use.
  2. Either use the provided scripts/generic.sh to transform generic SDMX-ML in samples/ to RDF/XML, or use it on your own data with an XSLT 2.0 processor, with a command something along the lines of (using saxonb-xslt for Ubuntu for example here):

The following takes the metadata from BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml using the scripts/generic.xsl template to create the corresponding RDF/XML in BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.rdf. The parameter xmlDocument value is used in the final transformation to let the processor know the file that was being transformed (also used for provenance data) - just reuse the same value as the input XML value in -s:

saxonb-xslt -s ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml -xsl generic.xsl xmlDocument=../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml > ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.rdf

Similar to above, the following generates the RDF/XML BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.rdf from BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.xml by making use of the knowledge from the metadata in BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml with parameter pathToGenericStructure:

saxonb-xslt -t -tree:linked -s ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.xml -xsl generic.xsl xmlDocument=../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.xml pathToGenericStructure=../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443106306.xml > ../samples/BIS.WEBSTATS_CIBL_UR_DATAFLOW-1351443131267.rdf

-tree:linked helps for large files.

Coverage

Based on sample data..

Clone this wiki locally