XML namespaces for XPath #4

DylanVanAssche · 2021-10-11T08:58:57Z

XPath allows to use XML namespaces when selecting parts of an XML document.
However, (most) implementations require to register these namespaces before doing an XPath query.
RML does not specify how does this should happen currently:

In the mapping rules?
By the implementation with a CLI parameter or dynamically by parsing the XML document first and find any namespaces
...

CARML has an extension for this: https://github.com/carml/carml#xml-namespace-extension
and it came up in the past already a few times without a clear solution:

dachafra · 2022-03-14T10:58:53Z

@DylanVanAssche is more this a challenge or a "best-practice" than a pure problem with the RML spec? Shall we transfer the issue?

DylanVanAssche · 2022-03-14T11:02:01Z

@dachafra For me, it is a spec thing because it is related to the rml:iterator. Maybe a Literal is insufficient here?

dachafra · 2022-03-14T11:05:18Z

@DylanVanAssche So... seen as well the proposal from CARML, it is more related to the Logical Source, right? Do we transfer it to that spec?

DylanVanAssche · 2022-03-14T11:05:59Z

True! Fine for transferring it!

DylanVanAssche · 2022-04-01T09:43:39Z

@pmaria I like the CARML approach for this issue:

rml:logicalSource [
    rml:source [
      a carml:Stream ;
      # or in case of a file source use:
      # carml:url "path-to-source" ;
      carml:declaresNamespace [
        carml:namespacePrefix "ex" ;
        carml:namespaceName "http://www.example.com/books/1.0/" ;
      ] ;
    ] ;
    rml:referenceFormulation ql:XPath ;
    rml:iterator "/ex:bookstore/*" ;
  ] ;

What do you think of using this?

rml:logicalSource [
  rml:source [
    # Any kind of source
  ] ;
  rml:iterator [ a ql:XPathIterator, rml:Iterator;
    rml:namespaceName "http://www.example.com/books/1.0/" ;
    rml:namespacePrefix "ex" ;
    rml:value "/ex:bookstore/*";
  ];
]

Changes:

Make the iterator an object instead of literal, drop rml:referenceFormulation
Move namespaces to the iterator, especially the XPath iterator
For JSON, CSV, etc. we would have the same, just not the namespace stuff
In the future, reference formulation X appears which is totally different and needs some stuff like the namespaces as well, we can support it.

pmaria · 2022-04-01T15:32:48Z

Hmm I'm not sure the iterator is the most natural place to define the namespaces. Since you also want to be able to use these namespaces in non-iterator expressions.

DylanVanAssche · 2022-04-04T09:42:27Z

@pmaria

Hmm I'm not sure the iterator is the most natural place to define the namespaces. Since you also want to be able to use these namespaces in non-iterator expressions.

When you use rml:reference, rr:column, rr:template, etc. you take the rml:iterator value, append the value of one of these references to retrieve what you need in a Triples Map.
That's why I found it a better fit there because if it specify for the reference formulation & iterator.
rml:source is only for defining how a source should be accessed such as location. Because of that, I would keep the namespace declaration away from that since those namespaces are only used for executing the iterator & references during the data processing after the data was retrieved from the source.

pmaria · 2022-04-05T05:51:23Z

When you use rml:reference, rr:column, rr:template, etc. you take the rml:iterator value, append the value of one of these references to retrieve what you need in a Triples Map.

Ah I don't see it that way necessarily. I see the rml:iterator, rml:reference, rr:template conceptually operating within the same scope/context. Wherein indeed, the iterator creates an iteration of sub documents on which the other expressions are evaluated. But I see the iterator as just another expression.

But I agree that source might not be the best place for the NS definition, because it is essentially a query concern, and the namespaces don't need to match the namespaces used in a source document.

maybe it makes more sense then to add a new object to the logical source, next to the iterator? Similar to your idea, but keeping iterator as is, i.e. as just another expression.

Something like rml:ExpressionContext.

rml:logicalSource [
  rml:source [
    # Any kind of source
  ] ;
  rml:iterator "/ex:bookstore/*" ;
  rml:expressionContext [ a XPathExpressionContext;
    rml:namespace [
       rml:namespaceName "http://www.example.com/books/1.0/" ;
       rml:namespacePrefix "ex" ;
    ];
  ]
  rml:referenceFormulation ql:XPath;
]

We could possibly combine it with the reference formulation? The rationale would be that this defines how to interpret the expressions that are based on a logical source.

pmaria · 2022-04-05T06:03:43Z

So combining it with reference formulations could look like

rml:logicalSource [
  rml:source [
    # Any kind of source
  ] ;
  rml:iterator "/ex:bookstore/*" ;
  rml:referenceFormulation [ a ql:XPathReferenceFomulation;
    ql:namespace [
       ql:namespaceName "http://www.example.com/books/1.0/" ;
       ql:namespacePrefix "ex" ;
    ] ;
  ] ;
]

This would be a custom specified XPath reference formulation, next to the "default" ql:XPath.

DylanVanAssche · 2022-04-05T06:39:41Z

Ah I don't see it that way necessarily. I see the rml:iterator, rml:reference, rr:template conceptually operating within the same scope/context. Wherein indeed, the iterator creates an iteration of sub documents on which the other expressions are evaluated. But I see the iterator as just another expression.

Ah depends on how you implement the spec :) Some implementations do not create subdocuments.
However, I agree with you :)

But I agree that source might not be the best place for the NS definition, because it is essentially a query concern, and the namespaces don't need to match the namespaces used in a source document.

Yes! I try to separate the concerns as much as possible so it also re-usable in the future.

rml:referenceFormulation definition:

The reference formulation (rml:referenceFormulation) defines the reference formulation used to refer to the elements of the data source. The reference formulation must be specified in the case of databases and XML and JSON data sources. By default SQL2008 for databases, as SQL2008 is the default for R2RML, XPath for XML and JSONPath for JSON data sources.

According to the definition, the last suggestion looks better to me.
Are we aware of something similar for other reference formulations?

This would be a custom specified XPath reference formulation, next to the "default" ql:XPath.

Ideally, we don't even need that and have 1 IRI for both (with and without namespaces), but I'm not sure how to achieve that in RDF? Properties can be optional, but if you have none, it become something weird like this:

rml:referenceFormulation [ a ql:XPathReferenceFomulation; ] ;

We could 'solve' this by having shortcuts:

rml:referenceFormulation ql:XPath;

This shortcut points to [ a ql:XPathReferenceFomulation; ].
I think this is what you meant above with the "default"?

pmaria · 2022-04-05T08:42:27Z

We could 'solve' this by having shortcuts:
rml:referenceFormulation ql:XPath;

Yes. I see that rml:ReferenceFormulation is already defined in the RML ontology.

rml:referenceFormulation rdfs:range rml:ReferenceFormulation .

rml:ReferenceFormulation rdf:type owl:Class ;
    rdfs:label   "Reference Formulation" ;
    rdfs:comment "Represents a Reference Formulation."@en .

And also defined is

ql:XPath rdf:type owl:NamedIndividual, rml:ReferenceFormulation  ;
    rdfs:label   "XPath" ; 
    rdfs:comment "Denotes the XPath reference formulation, used for referring to extracts of XML sources."@en ;
    ql:specification <http://www.w3.org/TR/xpath20/> ;
    rml:version "2.0".

So essentially the "shortcut" is just using the named individual.

Now all we would have to do is introduce a subclass of rml:ReferenceFormulation , rml:XPathReferenceFormulation, and define that further, adding namespace properties.

I don't think we should introduce a new named individual for XPath with namespaces. This would limit the namespaces you could define, since the individual's scope would be global. And you might want to define different namespaces per logical source.

DylanVanAssche · 2022-04-05T08:48:58Z

@pmaria Alright! I agree, let's setup our battle plan then for this issue:

Introduce rml:XPathReferenceFormulation
Define ql:namespaceName and ql:namespacePrefix in there

Problem solved then?

pmaria · 2022-04-05T10:05:32Z

Yes I think so 🎉

Not forgetting ql:namespace to spec one or more ql:Namespaces

chrdebru · 2022-04-11T13:13:29Z

Why put namespace URIs in literals rather than using resources?

DylanVanAssche · 2022-04-11T13:46:13Z

@chrdebru

Why put namespace URIs in literals rather than using resources?

Spec: https://www.w3.org/TR/xml-names/

[URI references identifying namespaces are compared when determining whether a name belongs to a given namespace, and whether two names belong to the same namespace. Definition: The two URIs are treated as strings, and they are identical if and only if the strings are identical, that is, if they are the same sequence of characters. ] The comparison is case-sensitive, and no %-escaping is done or undone.

AFAIK, XML Namespaces are not like Linked Data and are compared through a string-based comparison without any resolving.
That's why they are a Literal here, but any insights are welcome!

chrdebru · 2022-04-11T14:19:46Z

Yes, but they can also be regarded as named resources that can be described (no matter whether they dereference and resolve). Having those as resources would facilitate writing SPARQL queries and inverse property paths, for instance. Just a thought, not questioning the proposal.

I would suggest renaming ql:namespaceName to namespaceIRI. Some namespaces have titles and a namespace contains names. Turtle mentions this: "The '@Prefix' or 'PREFIX' directive associates a prefix label with an IRI".

DylanVanAssche · 2022-04-11T15:58:37Z

Yes, but they can also be regarded as named resources that can be described (no matter whether they dereference and resolve). Having those as resources would facilitate writing SPARQL queries and inverse property paths, for instance. Just a thought, not questioning the proposal.

I don't have much experience with that regard, so if it helps, I don't mind :)
For me, it doesn't really matter as long we have a mapping prefix <-> IRI

I would suggest renaming ql:namespaceName to namespaceIRI. Some namespaces have titles and a namespace contains names. Turtle mentions this: "The '@Prefix' or 'PREFIX' directive associates a prefix label with an IRI".

Hmmm true, twice 'name' might be a bit weird :)
@pmaria Do you agree on this?

pmaria · 2022-04-12T05:39:09Z

Namespace name is what the spec calls it https://www.w3.org/TR/xml-names/#dt-NSName, so I would stick to that.

As far as I can tell we can't simply use IRIs, because XML expects URIs.

The main use case is to register the namespaces with an XPath engine for querying. Most implementations I've seen represent the namespace name as a string.

My feeling is that keeping it a string would be the more natural mapping to implementations, but if the arguments for using an IRI are strong I can live with that. We would however have to specify what happens when an IRI that is not a URI is used..

andimou · 2022-04-23T12:30:20Z

I don't disagree with @chrdebru but we can as well keep it ql:namespace, whether IRI/URI or Literal can be determined based on the range, we don't need to include it in the name of the property.

Then again, if we include the restrictions in SHACL shapes, then we can decide on shape level iff it's string or IRI. There we can even provide 2 alternatives with 2 different explanations.

andimou · 2022-04-23T12:33:54Z

Another thought, I debate myself. Newer libraries might read the namespaces from the file, would we still want to give the option to define the namespaces?

pmaria · 2022-04-23T18:53:10Z

Another thought, I debate myself. Newer libraries might read the namespaces from the file, would we still want to give the option to define the namespaces?

In my experience this is not that trivial, especially in non-DOM based approaches, e.g. a streaming implementation. Namespaces can be defined inline in a document, so in theory a new namespace can be declared and used at the end of a document.

I have a strong preference to be able to declare this in the mapping. Tools can always also by default provide namespace detection as a service if it fits their architecture.

DylanVanAssche · 2022-04-25T13:13:40Z

I agree with @pmaria, extracting the XML namespaces is non trivial and may require consuming all XML first before any mapping can take place.

Then again, if we include the restrictions in SHACL shapes, then we can decide on shape level iff it's string or IRI. There we can even provide 2 alternatives with 2 different explanations.

SHACL can have an OR statement, but maybe to keep things straightforward we should have either a string or IRI, but not both?

chrdebru · 2022-04-25T13:47:16Z

@pmaria if they call them namespace names, then OK!

@DylanVanAssche XML namespaces are declared in attributes (strings) in XML. So maybe that definition comes from their technical constraints. The advantage of IRIs is that "sameness" is implied when reused, whereas now you have to explicitly state that two namespace objects (if you can call them like that) as the same, or you infer them by comparing strings. So IRIs may help us in cases where we have different prefixes for the same namespace (e.g., combining mappings).

DylanVanAssche · 2022-04-25T13:50:25Z

@chrdebru I don't have a specific preference, except that I prefer either strings or IRIs, just not both ;)

DylanVanAssche changed the title ~~XML namespaces~~ XML namespaces for XPath Oct 11, 2021

DylanVanAssche mentioned this issue Oct 11, 2021

Parse XML with namespace RMLio/rmlmapper-java#132

Closed

DylanVanAssche transferred this issue from kg-construct/rml-core Mar 14, 2022

DylanVanAssche added proposal Proposal available for fixing this issue in the spec rml representation labels Apr 1, 2022

DylanVanAssche mentioned this issue Apr 5, 2022

Specify how empty literals look like #6

Closed

DylanVanAssche added this to the v0.1 milestone Apr 5, 2022

pheyvaer mentioned this issue Apr 13, 2022

How to refer to default namespace in XML sources? RMLio/yarrrml-parser#158

Open

DylanVanAssche closed this as completed in kg-construct/dataio-spec@242d578 May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XML namespaces for XPath #4

XML namespaces for XPath #4

DylanVanAssche commented Oct 11, 2021

dachafra commented Mar 14, 2022

DylanVanAssche commented Mar 14, 2022

dachafra commented Mar 14, 2022 •

edited

Loading

DylanVanAssche commented Mar 14, 2022

DylanVanAssche commented Apr 1, 2022

pmaria commented Apr 1, 2022

DylanVanAssche commented Apr 4, 2022

pmaria commented Apr 5, 2022

pmaria commented Apr 5, 2022

DylanVanAssche commented Apr 5, 2022

pmaria commented Apr 5, 2022

DylanVanAssche commented Apr 5, 2022

pmaria commented Apr 5, 2022

chrdebru commented Apr 11, 2022

DylanVanAssche commented Apr 11, 2022

chrdebru commented Apr 11, 2022

DylanVanAssche commented Apr 11, 2022

pmaria commented Apr 12, 2022

andimou commented Apr 23, 2022

andimou commented Apr 23, 2022

pmaria commented Apr 23, 2022

DylanVanAssche commented Apr 25, 2022

chrdebru commented Apr 25, 2022

DylanVanAssche commented Apr 25, 2022

XML namespaces for XPath #4

XML namespaces for XPath #4

Comments

DylanVanAssche commented Oct 11, 2021

dachafra commented Mar 14, 2022

DylanVanAssche commented Mar 14, 2022

dachafra commented Mar 14, 2022 • edited Loading

DylanVanAssche commented Mar 14, 2022

DylanVanAssche commented Apr 1, 2022

pmaria commented Apr 1, 2022

DylanVanAssche commented Apr 4, 2022

pmaria commented Apr 5, 2022

pmaria commented Apr 5, 2022

DylanVanAssche commented Apr 5, 2022

pmaria commented Apr 5, 2022

DylanVanAssche commented Apr 5, 2022

pmaria commented Apr 5, 2022

chrdebru commented Apr 11, 2022

DylanVanAssche commented Apr 11, 2022

chrdebru commented Apr 11, 2022

DylanVanAssche commented Apr 11, 2022

pmaria commented Apr 12, 2022

andimou commented Apr 23, 2022

andimou commented Apr 23, 2022

pmaria commented Apr 23, 2022

DylanVanAssche commented Apr 25, 2022

chrdebru commented Apr 25, 2022

DylanVanAssche commented Apr 25, 2022

dachafra commented Mar 14, 2022 •

edited

Loading