-
Notifications
You must be signed in to change notification settings - Fork 42
Mapping PROV Qualified Names to xsd:QName
PROV-DM defines a PROV Identifier as a Qualified Name with the following definition: A qualified name is a name subject to namespace interpretation. It consists of a namespace, denoted by an optional prefix, and a local name. PROV-DM stipulates that a qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part.
PROV-N provides a concrete syntax for prov:QUALIFIED_NAME, further noting that a PROV-N qualified name QUALIFIED_NAME can be mapped to a valid IRI [RFC3987] by concatenating the namespace denoted its local name to the local name, whose -escaped characters have been unescaped by dropping the character '' (backslash).
PROV-XML defines the type of both the prov:id
and prov:ref
xml-attributes to be xsd:QName
as that is the XSD datatype that most closely matches the qualified name definition by PROV-DM. Care should be taken when generating PROV identifier values in PROV-XML such that there is a known mapping to a URI.
A further note adds:
The xsd:QName datatype is more restrictive than the QualifiedName defined in [PROV-N], e.g. PROV-N allows >local names to start with numbers, therefore valid identifier values in [PROV-N] serializations have to >potential to not be valid identifier values in PROV-XML. It is recommended to enhance interoperability that >provenance users strive to always use identifier schemes that map to valid xsd:QNames and URIs.
While this suggestion may work well for applications that are in full control of the design of their identifiers, this suggestion is not workable for applications, such as ProvToolbox, expected to consume arbitrary provenance in arbitrary representations. Any form of URI needs to be mapped to a Qualified Name for PROV-N and to an xsd:QName
for PROV-XML.
This limitation was recognized by the Provenance Working Group, a beginning of solution was outlined in email discussions, but never made it to the PROV-XML specification.
The purpose of this document is to outline the mapping process of Qualified Names to xsd:QName
adopted by ProvToolbox.
There already exists an encoding scheme that is reversible: Percent encoding as used in URIs. However, the character % is not valid in xsd:QNames. So, instead, we had to choose a character that is valid in local names and was not too frequently used, because itself would have to be escaped.
After consideration, it was decided to use _ (Underscore).
prov:QUALIFIED_NAME(*) | xsd:QName | Comment |
ex:abc | ex:abc | Provly identifier, no escaping required |
ex:abc01 | ex:abc01 | Provly identifier, no escaping required |
ex:01 | ex:_01 | QName starting by a non PN_CHAR_START to be escaped with _ |
ex: | ex:_ | empty local name mapped to _ |
ex:_ | ex:___ | _ escaped, and escaped again since at the start |
ex:a01b_c | ex:a01b__c | Escape _ |
ex:a@b | ex:a_40b | Mapping of @ to _40 |
ex:a~b | ex:a_7Eb | Mapping of ~ to _7E |
ex:a&b | ex:a_26b | Mapping of & to _26 |
ex:a+b | ex:a_2Bb | Mapping of + to _2B |
ex:a*b | ex:a_2Ab | Mapping of * to _2A |
ex:a#b | ex:a_23b | Mapping of # to _23 |
ex:a$b | ex:a_24b | Mapping of $ to _24 |
ex:a!b | ex:a_21b | Mapping of ! to _21b |
ex:a01bc | ex:a01bc | Mapping of to |
ex:a01/bc | ex:a01_2Fbc | Mapping of / to _2F |
ex:a01b\c | ex:a01b_5Cc | Mapping of \ to _5C |
ex:a01b=c | ex:a01b_3Dc | Mapping of = to _3D |
ex:a01b'c | ex:a01b_27c | Mapping of ' to _27 |
ex:a01b(c | ex:a01b_28c | Mapping of ( to _28 |
ex:a01b)c | ex:a01b_29c | Mapping of ) to _29 |
ex:a01b,c | ex:a01b_2Cc | Mapping of , to _2C |
ex:a01b:c | ex:a01b_3Ac | Mapping of : to _3A |
ex:a01b;c | ex:a01b_3Bc | Mapping of ; to _3B |
ex:a01b[c | ex:a01b_5Bc | Mapping of [ to _5B |
ex:a01b]c | ex:a01b_5Dc | Mapping of ] to _5D |
ex:a01b.c | ex:a01b.c | . permitted in QName |
ex:a01bc. | ex:a01bc. | . permitted at end of QName |
ex:='(),_:;[].@~ | ex:__3D_27_28_29_2C___3A _3B_5B_5D._40_7E | Escape them all except . |
ex:?a\=b | ex:__3Fa_5C_3Db | |
ex:55348dff-4fcc-4ac2-ab56-641798c64400 | ex:_55348dff-4fcc-4ac2-ab56-641798c64400 | Escaping of a UUID-like QualifiedName |
ex:À-ÖØ-öø-˿Ͱͽ | ex:À-ÖØ-öø-˿Ͱͽ | Support for Unicode |
(*) Note that the prov:QUALIFIED_NAME column displays unescaped Qualified Names. So, the correct syntax for ex:a01bc.
is ex:a01bc\.
since . is not allowed in final position.
We recognize that this solution is our own, and in a sense, is not inter-operable. Other solutions are possible. A future version of PROV-XML will have to specify this mapping.