-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML namespaces for XPath #4
Comments
@DylanVanAssche is more this a challenge or a "best-practice" than a pure problem with the RML spec? Shall we transfer the issue? |
@dachafra For me, it is a spec thing because it is related to the |
@DylanVanAssche So... seen as well the proposal from CARML, it is more related to the Logical Source, right? Do we transfer it to that spec? |
True! Fine for transferring it! |
@pmaria I like the CARML approach for this issue:
What do you think of using this?
Changes:
|
Hmm I'm not sure the iterator is the most natural place to define the namespaces. Since you also want to be able to use these namespaces in non-iterator expressions. |
When you use |
Ah I don't see it that way necessarily. I see the But I agree that source might not be the best place for the NS definition, because it is essentially a query concern, and the namespaces don't need to match the namespaces used in a source document. maybe it makes more sense then to add a new object to the logical source, next to the iterator? Similar to your idea, but keeping iterator as is, i.e. as just another expression. Something like rml:logicalSource [
rml:source [
# Any kind of source
] ;
rml:iterator "/ex:bookstore/*" ;
rml:expressionContext [ a XPathExpressionContext;
rml:namespace [
rml:namespaceName "http://www.example.com/books/1.0/" ;
rml:namespacePrefix "ex" ;
];
]
rml:referenceFormulation ql:XPath;
] We could possibly combine it with the reference formulation? The rationale would be that this defines how to interpret the expressions that are based on a logical source. |
So combining it with reference formulations could look like rml:logicalSource [
rml:source [
# Any kind of source
] ;
rml:iterator "/ex:bookstore/*" ;
rml:referenceFormulation [ a ql:XPathReferenceFomulation;
ql:namespace [
ql:namespaceName "http://www.example.com/books/1.0/" ;
ql:namespacePrefix "ex" ;
] ;
] ;
] This would be a custom specified XPath reference formulation, next to the "default" |
Ah depends on how you implement the spec :) Some implementations do not create subdocuments.
Yes! I try to separate the concerns as much as possible so it also re-usable in the future.
According to the definition, the last suggestion looks better to me.
Ideally, we don't even need that and have 1 IRI for both (with and without namespaces), but I'm not sure how to achieve that in RDF? Properties can be optional, but if you have none, it become something weird like this:
We could 'solve' this by having shortcuts:
This shortcut points to |
Yes. I see that rml:referenceFormulation rdfs:range rml:ReferenceFormulation .
rml:ReferenceFormulation rdf:type owl:Class ;
rdfs:label "Reference Formulation" ;
rdfs:comment "Represents a Reference Formulation."@en . And also defined is ql:XPath rdf:type owl:NamedIndividual, rml:ReferenceFormulation ;
rdfs:label "XPath" ;
rdfs:comment "Denotes the XPath reference formulation, used for referring to extracts of XML sources."@en ;
ql:specification <http://www.w3.org/TR/xpath20/> ;
rml:version "2.0". So essentially the "shortcut" is just using the named individual. Now all we would have to do is introduce a subclass of I don't think we should introduce a new named individual for XPath with namespaces. This would limit the namespaces you could define, since the individual's scope would be global. And you might want to define different namespaces per logical source. |
@pmaria Alright! I agree, let's setup our battle plan then for this issue:
Problem solved then? |
Yes I think so 🎉 Not forgetting |
Why put namespace URIs in literals rather than using resources? |
Spec: https://www.w3.org/TR/xml-names/
AFAIK, XML Namespaces are not like Linked Data and are compared through a string-based comparison without any resolving. |
Yes, but they can also be regarded as named resources that can be described (no matter whether they dereference and resolve). Having those as resources would facilitate writing SPARQL queries and inverse property paths, for instance. Just a thought, not questioning the proposal. I would suggest renaming |
I don't have much experience with that regard, so if it helps, I don't mind :)
Hmmm true, twice 'name' might be a bit weird :) |
Namespace name is what the spec calls it https://www.w3.org/TR/xml-names/#dt-NSName, so I would stick to that. As far as I can tell we can't simply use IRIs, because XML expects URIs. The main use case is to register the namespaces with an XPath engine for querying. Most implementations I've seen represent the namespace name as a string. My feeling is that keeping it a string would be the more natural mapping to implementations, but if the arguments for using an IRI are strong I can live with that. We would however have to specify what happens when an IRI that is not a URI is used.. |
I don't disagree with @chrdebru but we can as well keep it Then again, if we include the restrictions in SHACL shapes, then we can decide on shape level iff it's string or IRI. There we can even provide 2 alternatives with 2 different explanations. |
Another thought, I debate myself. Newer libraries might read the namespaces from the file, would we still want to give the option to define the namespaces? |
In my experience this is not that trivial, especially in non-DOM based approaches, e.g. a streaming implementation. Namespaces can be defined inline in a document, so in theory a new namespace can be declared and used at the end of a document. I have a strong preference to be able to declare this in the mapping. Tools can always also by default provide namespace detection as a service if it fits their architecture. |
I agree with @pmaria, extracting the XML namespaces is non trivial and may require consuming all XML first before any mapping can take place.
SHACL can have an OR statement, but maybe to keep things straightforward we should have either a string or IRI, but not both? |
@pmaria if they call them namespace names, then OK! @DylanVanAssche XML namespaces are declared in attributes (strings) in XML. So maybe that definition comes from their technical constraints. The advantage of IRIs is that "sameness" is implied when reused, whereas now you have to explicitly state that two namespace objects (if you can call them like that) as the same, or you infer them by comparing strings. So IRIs may help us in cases where we have different prefixes for the same namespace (e.g., combining mappings). |
@chrdebru I don't have a specific preference, except that I prefer either strings or IRIs, just not both ;) |
XPath allows to use XML namespaces when selecting parts of an XML document.
However, (most) implementations require to register these namespaces before doing an XPath query.
RML does not specify how does this should happen currently:
CARML has an extension for this: https://github.com/carml/carml#xml-namespace-extension
and it came up in the past already a few times without a clear solution:
The text was updated successfully, but these errors were encountered: