Skip to content
Sarven Capadisli edited this page Jan 12, 2013 · 27 revisions

Before you start

You are reading this because you have some familiarity with SDMX-ML or the RDF Data Cube vocabulary. Some knowledge of Linked Data practices, XML, XSLT would be handy as well.

What can it do?

Given Generic SDMX-ML data or metadata as input, XSL 2.0 templates transforms them to RDF/XML. It uses vocabularies like RDF Data Cube, SDMX-RDF, SKOS, XKOS, PROV-O..

The transformation follows some common Linked Data practices as well as other ones out of thin air :) If you disagree or would like to propose alternatives, please either contact me or better yet, create an issue. Relevant changes will then be reflected here.

Configuration

The scripts/config.rdf file is used to configure some stuff for the transformations. Here is an outline for some of the noteworthy things in the templates.

URI configurations

Base URIs can be set for classes, codelists, concept schemes, datasets, slices, properties, provenance, as well as for the source SDMX data.

The value for uriThingSeparator e.g., /, lets one set the delimiter to separate the "thing" from the rest of the URI. In the Linked Data community, this is typically either a / or #. For example, if slash is used, an URI would end up like http://example.org/code/EUROSTAT/CL_GEO (note the last slash before CL_GEO). If hash is used, an URI would end up like http://example.org/code/EUROSTAT#CL_GEO.

Similarly, uriDimensionSeparator can be set to separate dimension values that's used in RDF Data Cube observation URIs. As observation should have its own unique URI, the method to construct URIs is done by taking dimension values as safe terms to be used in URIs separated by the value in uriDimensionSeparator. For example, here is a crazy looking observation URI where uriDimensionSeparator is set to /: http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01/5938/1/15497/4/21/1/2011/2011-12-31. But with uriThingSeparator set to # and uriDimensionSeparator set to -, it could end up like http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01#5938-1-15497-4-21-1-2011-2011-12-31. If you are wondering about DSD_T_PERSON_STATTAB-01-2A01, that's the KeyFamily (DSD) id, and REF_DEMO is the agency id which is picked up automatically from source data, and http://example.org/dataset/ would be the value that can be set in config for the base URI for dataset.

Creator's URI can also be set which is also used for provenance data.

Default to language

Possibility to force a default xml:lang to be used for skos:prefLabel and skos:definition when lang is not originally in data. If config.rdf contains a non-empty lang value it will use it.

Interlinking SDMX Annotations

SDMX Annotations contain important information that can be put to use by the publisher. Data in AnnotationTypes for instance is standardized by the publisher's needs, but there is no standardization on how they are used across all SDMX publishers. Therefore, in order not to leave this information behind in the final transformation, the configuration allows the publisher to simply define the way they will be put to use. This done by setting interlinkAnnotationTypes: the AnnotationType to detect (in rdfs:label), the predicate (as an XML QName) to use (in rdf:predicate), and the instances of Concepts or Codes to apply to (in rdf:type). Currently this feature is only applied to Annotations in Concepts and Codes. For example, given the following SDMX snippet:

<structure:CodeList id="CL_HGDE_GDE" agencyID="CH1_RN">
  <structure:Code value="13256">
    <structure:Description>Aeugst am Albis</structure:Description>
    <structure:Annotations>
      <common:Annotation>
        <common:AnnotationType>CODE_OFS</common:AnnotationType>
        <common:AnnotationText>1</common:AnnotationText>
      </common:Annotation>
  </structure:Code>
</structure:CodeList>

and the following configuration in config.rdf:

<rdf:value>
  <rdf:Description>
    <rdf:value>
      <rdf:Description>
        <rdf:predicate>xkos:hasPart</rdf:predicate>
        <rdf:type>http://example.org/code/CH1_RN/CL_HGDE_GDE</rdf:type>
        <rdfs:label>CODE_OFS</rdfs:label>
      </rdf:Description>
    </rdf:value>
    <rdfs:label>interlinkAnnotationTypes</rdfs:label>
  </rdf:Description>
</rdf:value>

would result in the final RDF/XML transformation like:

<rdf:Description rdf:about="http://example.org/code/CH1_RN/CL_HGDE_GDE/13256">
  <xkos:hasPart rdf:resource="http://example.org/code/CH1_RN/CL_HGDE_GDE/1">
</rdf:Description>

Vocabularies

Besides the common vocabularies: RDF RDFS, XSD, OWL, XSD, the RDF Data Cube vocabulary is used to describe multi-dimensional statistical data, and SDMX-RDF for the statistical information model. PROV-O is used for some basic provenance coverage (although I'm not fully sure if I want to leave it in there). And of course SKOS and XKOS to cover concepts, concept schemes and their relationships to one another. XKOS is currently applied primarily for hierarchical lists here (I hope I understood the vocabulary correctly).

Reuse of SDMX CodeLists and Codes

SDMX metadata and data may use CodeLists that's published by another agencies. For the agency SDMX, this is typically indicated by the component agency being set to SDMX e.g., codelistAgency="SDMX" of a structure:Component and/or agencyID="SDMX" of a CodeList with id="CL_FREQ". When this is detected, corresponding URIs from the SDMX-RDF vocabulary is used e.g., for metadata; http://purl.org/linked-data/sdmx/2009/code#freq, and data; http://purl.org/linked-data/sdmx/2009/code#freq-A.

There is an open issue #5 to reuse URIs from existing agencyIDs other than SDMX.

Provenance

There is preliminary provenance level data:

Resources of type qb:DataStructureDefinitions and qb:DataSets are also typed with prov:Entity, and given prov:wasAttributedTo with the value from creator (which is typed with prov:Agent) in config.rdf. There is a unique prov:Activity for each transformation (typically one for the data and another for the DSD), and it contains values for prov:startedAtTime, prov:used (which files that was used) to what was prov:generated (and source data URI that it prov:wasDerivedFrom).

URI Patterns

Here is an outline for the URI patterns that's used. example.org is used for the domain as an example followed with class, code, concept, dataset, property, prov, or slice as example (i.e., they can be changed from config). /s are used to separate the things and dimensions in URIs, which can also be changed from config. Variable values are derived directly from source SDMX. Some skos:ConceptSchemes have uriValidFromToSeparator which is generated by combining date validity information when both validFrom and validTo are provided.

qb:DataStructureDefinition

http://example.org/dataset/{$agencyID}/{$KeyFamilyRef}/structure

qb:Observation

http://example.org/dataset/{$agencyID}/{$KeyFamilyRef}/{dimension-1}/../dimension-n}

qb:Slice

http://example.org/slice/{$agencyID}/{$KeyFamilyRef}/{dimension-1}/../dimension-n-exluding-FREQ-concept}

skos:Collection

http://example.org/code/{$agencyID}/{$hierarchicalCodeListID}
http://example.org/code/{$agencyID}/{$hierarchyID}

sdmx:CodeList

http://example.org/code/{$agencyID}/{$codeListID}{$uriValidFromToSeparator}

skos:ConceptScheme

http://example.org/concept/{$agencyID}/{$conceptSchemeID}{$uriValidFromToSeparator}

skos:Concept , sdmx:Concept

http://example.org/code/{$agencyID}/{$codeListID}/{@codeID}
http://example.org/concept/{$agencyID}/{$conceptSchemeID}/{@conceptID}

owl:Class and rdfs:Class

http://example.org/class/{$agencyID}/{$codeListID}

rdf:Property , qb:DimensionProperty , qb:MeasureProperty , qb:AttributeProperty

http://example.org/property/{$agencyID}/{$conceptID}

Properties

Properties used in structure (DSD, codelists, ..) and data (observations) are listed below:

Structure

http://example.org/property/{$agencyID}/{$conceptID}
http://purl.org/dc/terms/identifier
http://purl.org/dc/terms/references
http://purl.org/linked-data/cube#attribute
http://purl.org/linked-data/cube#codeList
http://purl.org/linked-data/cube#component
http://purl.org/linked-data/cube#componentAttachment
http://purl.org/linked-data/cube#componentProperty
http://purl.org/linked-data/cube#concept
http://purl.org/linked-data/cube#dimension
http://purl.org/linked-data/cube#measure
http://purl.org/linked-data/cube#order
http://purl.org/linked-data/cube#sliceKey
http://purl.org/linked-data/sdmx/2009/concept#dataRev
http://purl.org/linked-data/sdmx/2009/concept#dsi
http://purl.org/linked-data/sdmx/2009/concept#mAgency
http://purl.org/linked-data/sdmx/2009/concept#validFrom
http://purl.org/linked-data/sdmx/2009/concept#validTo
http://purl.org/linked-data/xkos#hasPart
http://purl.org/linked-data/xkos#isPartOf
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#comment
http://www.w3.org/2000/01/rdf-schema#range
http://www.w3.org/2000/01/rdf-schema#seeAlso
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2004/02/skos/core#definition
http://www.w3.org/2004/02/skos/core#hasTopConcept
http://www.w3.org/2004/02/skos/core#inScheme
http://www.w3.org/2004/02/skos/core#notation
http://www.w3.org/2004/02/skos/core#prefLabel
http://www.w3.org/2004/02/skos/core#topConceptOf
http://www.w3.org/ns/prov#generated
http://www.w3.org/ns/prov#startedAtTime
http://www.w3.org/ns/prov#used
http://www.w3.org/ns/prov#wasAttributedTo
http://www.w3.org/ns/prov#wasDerivedFrom

Data

http://example.org/property/{$agencyID}/{$conceptID}
http://purl.org/linked-data/cube#dataSet
http://purl.org/linked-data/cube#observation
http://purl.org/linked-data/cube#slice
http://purl.org/linked-data/cube#sliceStructure
http://purl.org/linked-data/cube#structure
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/ns/prov#generated
http://www.w3.org/ns/prov#startedAtTime
http://www.w3.org/ns/prov#used
http://www.w3.org/ns/prov#wasAttributedTo
http://www.w3.org/ns/prov#wasDerivedFrom

Types of resources

Type of resources in the structure (DSD, codelists, ..) and data (observations) are listed below:

Structure

http://example.org/class/{$agencyID}/{$codeListID}
http://purl.org/linked-data/cube#AttributeProperty
http://purl.org/linked-data/cube#ComponentSpecification
http://purl.org/linked-data/cube#DataStructureDefinition
http://purl.org/linked-data/cube#DimensionProperty
http://purl.org/linked-data/cube#MeasureProperty
http://purl.org/linked-data/sdmx#CodeList
http://purl.org/linked-data/sdmx#Concept
http://purl.org/linked-data/sdmx#DataStructureDefinition
http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://www.w3.org/2000/01/rdf-schema#Class
http://www.w3.org/2002/07/owl#Class
http://www.w3.org/2004/02/skos/core#Collection
http://www.w3.org/2004/02/skos/core#Concept
http://www.w3.org/2004/02/skos/core#ConceptScheme
http://www.w3.org/ns/prov#Activity
http://www.w3.org/ns/prov#Agent
http://www.w3.org/ns/prov#Entity

Data

http://purl.org/linked-data/cube#DataSet
http://purl.org/linked-data/cube#Observation
http://www.w3.org/ns/prov#Activity
http://www.w3.org/ns/prov#Agent
http://www.w3.org/ns/prov#Entity

Datatypes

Some of the XSD datatypes are applied to object resources based on SDMX strucutre:TextFormat/@textType. See also issues #3 and #9.

How to run:

  1. Edit scripts/config.rdf to configure things like base URIs, delimiters to use in URIs, or even how to put SDMX AnnotationTypes into good use. If you don't edit, it will work with defaults (e.g., example.org, /).

  2. Either use the provided scripts/generic.sh to transform generic SDMX-ML in data/ to RDF/XML, or use it on your own data with an XSLT 2.0 processor, with a command something along the lines of (using the Debian saxonb-xslt for example here):

The following takes the metadata from generic.structure.xml using the scripts/generic.xsl template to create the corresponding RDF/XML in generic.structure.rdf. The parameter xmlDocument value is used in the final transformation to let the processor know the file that was being transformed (also used for provenance data) - just reuse the same value as the input XML value in -s, and pathToGenericStructure parameter value is same as xmlDocument in this case because we are going to transform the SDMX KeyFamily / DSD):

saxonb-xslt -s generic.structure.xml -xsl generic.xsl xmlDocument=generic.structure.xml pathToGenericStructure=generic.structure.xml > generic.structure.rdf

Similar to above, but this time we are going to use the generic.structure.xml for the generic data. The following generates the RDF/XML generic.data.rdf from generic.data.xml by making use of the generic structure data in generic.structure.xml with parameter pathToGenericStructure:

saxonb-xslt -t -tree:linked -s generic.data.xml -xsl generic.xsl xmlDocument=generic.structure.xml pathToGenericStructure=generic.structure.xml > generic.data.rdf

-tree:linked in saxonb-xslt helps for large files, not to mention giving more memory to the processor.

Coverage

The following is a coverage (in progress) based on sample data.

BIS OECD UN ECB WB IMF FAO EUROSTAT BFS
"External agencies" refers to agencies in which the SDMX publisher is using an external agency's concepts, codelists etc.
External Agencies SDMX EUROSTAT IAEG SDMX OECD
Annotation(Type) Y Y Y
Hierarchical CodeLists Y Y Y Y Y Y
Datatype (OBS_VALUE) string double double
Datatype (TIME_FORMAT) string string
Datatype (TIME_PERIOD) string
Datatype (OBS_STATUS) string string
Datatype (OBS_CONF) string
Clone this wiki locally