-
Notifications
You must be signed in to change notification settings - Fork 5
Home
You are reading this because you have some familiarity with SDMX-ML or the RDF Data Cube vocabulary. Some knowledge of Linked Data practices, XML, XSLT would be handy as well.
Given Generic SDMX-ML data or metadata as input, XSL 2.0 templates transforms them to RDF/XML. It uses vocabularies like RDF Data Cube, SDMX-RDF, SKOS, XKOS, PROV-O..
The transformation follows some common Linked Data practices as well as other ones out of thin air :) If you disagree or would like to propose alternatives, please either contact me or better yet, create an issue. Relevant changes will then be reflected here.
The scripts/config.rdf
file is used to configure some stuff for the transformations. Here is an outline for some of the noteworthy things in the templates.
Base URIs can be set for classes, codelists, concept schemes, datasets, slices, properties, provenance, as well as for the source SDMX data.
The value for uriThingSeparator
e.g., /
, lets one set the delimiter to separate the "thing" from the rest of the URI. In the Linked Data community, this is typically either a /
or #
. For example, if slash is used, an URI would end up like http://example.org/code/EUROSTAT/CL_GEO
(note the last slash before CL_GEO). If hash is used, an URI would end up like http://example.org/code/EUROSTAT#CL_GEO
.
Similarly, uriDimensionSeparator
can be set to separate dimension values that's used in RDF Data Cube observation URIs. As observation should have its own unique URI, the method to construct URIs is done by taking dimension values as safe terms to be used in URIs separated by the value in uriDimensionSeparator
. For example, here is a crazy looking observation URI where uriDimensionSeparator
is set to /
: http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01/5938/1/15497/4/21/1/2011/2011-12-31
. But with uriThingSeparator
set to #
and uriDimensionSeparator
set to -
, it could end up like http://example.org/dataset/REF_DEMO/DSD_T_PERSON_STATTAB-01-2A01#5938-1-15497-4-21-1-2011-2011-12-31
. If you are wondering about DSD_T_PERSON_STATTAB-01-2A01
, that's the KeyFamily (DSD) id, and REF_DEMO
is the agency id which is picked up automatically from source data, and http://example.org/dataset/
would be the value that can be set in config for the base URI for dataset.
Creator's URI can also be set which is also used for provenance data.
Possibility to force a default xml:lang on skos:prefLabel and skos:definition when lang is not originally in the data. If config.rdf contains a non-empty lang value it will use it e.g.,:
<rdf:Description>
<rdf:value>en</rdf:value>
<rdfs:label>lang</rdfs:label>
</rdf:Description>
Default language may also be applied in the case of Annotations. See Interlinking SDMX Annotations for example.
SDMX Annotations contain important information that can be put to use by the publisher. Data in AnnotationTypes are typically used as publisher's internal conventions. Hence, there is no standardization on how they are used across all SDMX publishers. In order not to leave this information behind in the final transformation, the configuration allows publishers to define the way they should be transformed. This done by setting interlinkAnnotationTypes
: the AnnotationType to detect (in rdfs:label
), the predicate (as an XML QName) to use (in rdf:predicate
), and the instances of Concepts or Codes to apply to (in rdf:type
). Currently this feature is only applied to Annotations in Concepts and Codes. For example, given the following SDMX snippet:
<structure:CodeList id="CL_HGDE_GDE" agencyID="CH1_RN">
<structure:Code value="13256">
<structure:Description>Aeugst am Albis</structure:Description>
<structure:Annotations>
<common:Annotation>
<common:AnnotationType>CODE_OFS</common:AnnotationType>
<common:AnnotationText>1</common:AnnotationText>
</common:Annotation>
<common:Annotation>
<common:AnnotationType>ABBREV</common:AnnotationType>
<common:AnnotationText>A.a.A.</common:AnnotationText>
</common:Annotation>
</structure:Code>
</structure:CodeList>
and the following configuration in config.rdf:
<rdf:value>
<rdf:Description>
<rdf:value>
<rdf:Description>
<rdf:predicate>xkos:hasPart</rdf:predicate>
<rdf:type>http://example.org/code/CH1_RN/CL_HGDE_GDE</rdf:type>
<rdfs:label>CODE_OFS</rdfs:label>
</rdf:Description>
<rdf:Description>
<rdf:predicate>xkos:hasPart</rdf:predicate>
<rdf:type>http://example.org/code/CH1_RN/CL_HGDE_GDE</rdf:type>
<rdfs:label>CODE_OFS</rdfs:label>
</rdf:Description>
<rdf:Description>
<rdf:predicate>skos:altLabel</rdf:predicate>
<rdf:type>XMLLiteral</rdf:type>
<rdfs:label>ABBREV</rdfs:label>
</rdf:Description>
</rdf:value>
<rdfs:label>interlinkAnnotationTypes</rdfs:label>
</rdf:Description>
</rdf:value>
would result in the final RDF/XML transformation like:
<rdf:Description rdf:about="http://example.org/code/CH1_RN/CL_HGDE_GDE/13256">
<xkos:hasPart rdf:resource="http://example.org/code/CH1_RN/CL_HGDE_GDE/1">
<skos:altLabel>A.a.a.</skos:altLabel>
</rdf:Description>
Only the AnnotationTypes with a corresponding configuration will be applied, and unspecific ones will be skipped.
If the default language had been set, the output would have contained xml:lang="{$lang}"
.
Besides the common vocabularies: RDF RDFS, XSD, OWL, XSD, the RDF Data Cube vocabulary is used to describe multi-dimensional statistical data, and SDMX-RDF for the statistical information model. PROV-O is used for some basic provenance coverage (although I'm not fully sure if I want to leave it in there). And of course SKOS and XKOS to cover concepts, concept schemes and their relationships to one another. XKOS is currently applied primarily for hierarchical lists here (I hope I understood the vocabulary correctly).
SDMX metadata and data may use CodeLists that's published by another agencies. For the agency SDMX, this is typically indicated by the component agency being set to SDMX
e.g., codelistAgency="SDMX"
of a structure:Component and/or agencyID="SDMX"
of a CodeList with id="CL_FREQ"
. When this is detected, corresponding URIs from the SDMX-RDF vocabulary is used e.g., for metadata; http://purl.org/linked-data/sdmx/2009/code#freq
, and data; http://purl.org/linked-data/sdmx/2009/code#freq-A
.
There is an open issue #5 to reuse URIs from existing agencyIDs other than SDMX.
There is preliminary provenance level data:
Resources of type qb:DataStructureDefinition
s and qb:DataSet
s are also typed with prov:Entity
, and given prov:wasAttributedTo
with the value from creator
(which is typed with prov:Agent
) in config.rdf. There is a unique prov:Activity
for each transformation (typically one for the data and another for the DSD), and it contains values for prov:startedAtTime
, prov:used
(which files that was used) to what was prov:generated
(and source data URI that it prov:wasDerivedFrom
).
Here is an outline for the URI patterns that's used. example.org
is used for the domain as an example followed with class
, code
, concept
, dataset
, property
, prov
, or slice
as example (i.e., they can be changed from config). /
s are used to separate the things and dimensions in URIs, which can also be changed from config. Variable values are derived directly from source SDMX. Some skos:ConceptSchemes have uriValidFromToSeparator
which is generated by combining date validity information when both validFrom and validTo are provided.
http://example.org/dataset/{$agencyID}/{$KeyFamilyRef}/structure
http://example.org/dataset/{$agencyID}/{$KeyFamilyRef}/{dimension-1}/../dimension-n}
http://example.org/slice/{$agencyID}/{$KeyFamilyRef}/{dimension-1}/../dimension-n-exluding-FREQ-concept}
http://example.org/code/{$agencyID}/{$hierarchicalCodeListID}
http://example.org/code/{$agencyID}/{$hierarchyID}
http://example.org/code/{$agencyID}/{$codeListID}{$uriValidFromToSeparator}
http://example.org/concept/{$agencyID}/{$conceptSchemeID}{$uriValidFromToSeparator}
http://example.org/code/{$agencyID}/{$codeListID}/{@codeID}
http://example.org/concept/{$agencyID}/{$conceptSchemeID}/{@conceptID}
http://example.org/class/{$agencyID}/{$codeListID}
http://example.org/property/{$agencyID}/{$conceptID}
Properties used in structure (DSD, codelists, ..) and data (observations) are listed below:
http://example.org/property/{$agencyID}/{$conceptID}
http://purl.org/dc/terms/identifier
http://purl.org/dc/terms/references
http://purl.org/linked-data/cube#attribute
http://purl.org/linked-data/cube#codeList
http://purl.org/linked-data/cube#component
http://purl.org/linked-data/cube#componentAttachment
http://purl.org/linked-data/cube#componentProperty
http://purl.org/linked-data/cube#concept
http://purl.org/linked-data/cube#dimension
http://purl.org/linked-data/cube#measure
http://purl.org/linked-data/cube#order
http://purl.org/linked-data/cube#sliceKey
http://purl.org/linked-data/sdmx/2009/concept#dataRev
http://purl.org/linked-data/sdmx/2009/concept#dsi
http://purl.org/linked-data/sdmx/2009/concept#mAgency
http://purl.org/linked-data/sdmx/2009/concept#validFrom
http://purl.org/linked-data/sdmx/2009/concept#validTo
http://purl.org/linked-data/xkos#hasPart
http://purl.org/linked-data/xkos#isPartOf
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#comment
http://www.w3.org/2000/01/rdf-schema#range
http://www.w3.org/2000/01/rdf-schema#seeAlso
http://www.w3.org/2000/01/rdf-schema#subClassOf
http://www.w3.org/2004/02/skos/core#definition
http://www.w3.org/2004/02/skos/core#hasTopConcept
http://www.w3.org/2004/02/skos/core#inScheme
http://www.w3.org/2004/02/skos/core#notation
http://www.w3.org/2004/02/skos/core#prefLabel
http://www.w3.org/2004/02/skos/core#topConceptOf
http://www.w3.org/ns/prov#generated
http://www.w3.org/ns/prov#startedAtTime
http://www.w3.org/ns/prov#used
http://www.w3.org/ns/prov#wasAttributedTo
http://www.w3.org/ns/prov#wasDerivedFrom
http://example.org/property/{$agencyID}/{$conceptID}
http://purl.org/linked-data/cube#dataSet
http://purl.org/linked-data/cube#observation
http://purl.org/linked-data/cube#slice
http://purl.org/linked-data/cube#sliceStructure
http://purl.org/linked-data/cube#structure
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/ns/prov#generated
http://www.w3.org/ns/prov#startedAtTime
http://www.w3.org/ns/prov#used
http://www.w3.org/ns/prov#wasAttributedTo
http://www.w3.org/ns/prov#wasDerivedFrom
Type of resources in the structure (DSD, codelists, ..) and data (observations) are listed below:
http://example.org/class/{$agencyID}/{$codeListID}
http://purl.org/linked-data/cube#AttributeProperty
http://purl.org/linked-data/cube#ComponentSpecification
http://purl.org/linked-data/cube#DataStructureDefinition
http://purl.org/linked-data/cube#DimensionProperty
http://purl.org/linked-data/cube#MeasureProperty
http://purl.org/linked-data/sdmx#CodeList
http://purl.org/linked-data/sdmx#Concept
http://purl.org/linked-data/sdmx#DataStructureDefinition
http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://www.w3.org/2000/01/rdf-schema#Class
http://www.w3.org/2002/07/owl#Class
http://www.w3.org/2004/02/skos/core#Collection
http://www.w3.org/2004/02/skos/core#Concept
http://www.w3.org/2004/02/skos/core#ConceptScheme
http://www.w3.org/ns/prov#Activity
http://www.w3.org/ns/prov#Agent
http://www.w3.org/ns/prov#Entity
http://purl.org/linked-data/cube#DataSet
http://purl.org/linked-data/cube#Observation
http://www.w3.org/ns/prov#Activity
http://www.w3.org/ns/prov#Agent
http://www.w3.org/ns/prov#Entity
Some of the XSD datatypes are applied to object resources based on SDMX strucutre:TextFormat/@textType
. See also issues #3 and #9, the coverage below.
-
Edit
scripts/config.rdf
to configure things like base URIs, delimiters to use in URIs, or even how to put SDMX AnnotationTypes into good use. If you don't edit, it will work with defaults (e.g., example.org, /). -
Either use the provided
scripts/generic.sh
to transform generic SDMX-ML in data/ to RDF/XML, or use it on your own data with an XSLT 2.0 processor, with a command something along the lines of (using the Debian saxonb-xslt for example here):
The following takes the metadata from generic.structure.xml
using the scripts/generic.xsl
template to create the corresponding RDF/XML in generic.structure.rdf
. The parameter xmlDocument
value is used in the final transformation to let the processor know the file that was being transformed (also used for provenance data) - just reuse the same value as the input XML value in -s, and pathToGenericStructure
parameter value is same as xmlDocument
in this case because we are going to transform the SDMX KeyFamily / DSD):
saxonb-xslt -s generic.structure.xml -xsl generic.xsl xmlDocument=generic.structure.xml pathToGenericStructure=generic.structure.xml > generic.structure.rdf
Similar to above, but this time we are going to use the generic.structure.xml
for the generic data. The following generates the RDF/XML generic.data.rdf
from generic.data.xml
by making use of the generic structure data in generic.structure.xml
with parameter pathToGenericStructure
:
saxonb-xslt -t -tree:linked -s generic.data.xml -xsl generic.xsl xmlDocument=generic.structure.xml pathToGenericStructure=generic.structure.xml > generic.data.rdf
-tree:linked
in saxonb-xslt helps for large files, not to mention giving more memory to the processor.
The following is a coverage (in progress) based on sample data.
BIS | OECD | UN | ECB | WB | IMF | FAO | EUROSTAT | BFS | |
---|---|---|---|---|---|---|---|---|---|
"External agencies" refers to agencies in which the SDMX publisher is using an external agency's concepts, codelists etc. | |||||||||
External Agencies | SDMX | EUROSTAT | IAEG | SDMX | OECD | ||||
Annotation(Type) | Y | Y | Y | ||||||
Hierarchical CodeLists | Y | Y | Y | Y | Y | Y | |||
Datatype (OBS_VALUE) | String | Double | Double | Double | |||||
Datatype (TIME_FORMAT) | String | String | |||||||
Datatype (TIME_PERIOD) | String | ||||||||
Datatype (OBS_STATUS) | String | String | |||||||
Datatype (OBS_CONF) | String | ||||||||
SDMX Version | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 |