Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge: Languagemap for multi-value reference #18

Open
thomas-delva opened this issue Nov 9, 2020 · 12 comments
Open

Challenge: Languagemap for multi-value reference #18

thomas-delva opened this issue Nov 9, 2020 · 12 comments
Assignees
Labels
r2rml r2rml issues representation representation issues rml rml issues

Comments

@thomas-delva
Copy link
Contributor

From github issue: RMLio/rmlmapper-java#65

Description

In RML, if there is an object map that has a reference which creates multiple terms creates and that object map also has a language map, the mapper does not know (and cannot know?) which value of the reference to combine with which value of the language map.

It would be interesting to investigate a way to say in RML that the objectmap should go over the <rdaw:P10086> tags, and then extract from each a value and a language. (Whereas now, RML can only say to create multiple values from the <rdaw:P10086> tags, and also, independently, to create multiple language tags from the <rdaw:P10086> tags.)

Input data (xml)

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:rdaw="http://rdaregistry.info/Elements/w/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618">
    <rdaw:P10086 xml:lang="pt">Lórax (Beber)</rdaw:P10086>
    <rdaw:P10086 xml:lang="af">Loraks</rdaw:P10086>
    <rdaw:P10086 xml:lang="ru">Driad</rdaw:P10086>
    <rdaw:P10086 xml:lang="es">Lórax</rdaw:P10086>
  </rdf:Description>
<rdf:RDF>

Desired output

@prefix bf: <http://id.loc.gov/ontologies/bibframe/>.

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:0 .

_:0 a bf:VariantTitle;
  bf:mainTitle "Driad"@pt, "Loraks"@af, "Lórax"@ru, "Lórax (Beber)"@es .
@dachafra
Copy link
Member

dachafra commented Nov 9, 2020

Hi @thomas-delva, is this issue not the same as #2?

@thomas-delva
Copy link
Contributor Author

Hi @dachafra , the issues are related of course, but this one is more specific than the other: it assumes #2 is solved and language tags can be generated from data (with languageMap), but this issue is about when languageMap is used together with a reference which returns >1 things.

Hope this clears things up!

@dachafra dachafra added r2rml r2rml issues representation representation issues rml rml issues labels Nov 9, 2020
@VladimirAlexiev
Copy link
Collaborator

Seems to me RML is missing some notion of locality. What we need for this example is:

  • iterate over //rdaw:P10086
    • make a literal with value text() and lang @xml:lang

Iterating twice is conceptually wrong:

  • iterate over //rdaw:P10086 and get text()
  • iterate over //rdaw:P10086 and get @xml:lang

@pmaria
Copy link
Collaborator

pmaria commented Nov 23, 2020

@thomas-delva I'm trying to understand the example. Particularly the desired result, since the combinations of reference value and language value seem to be randomly combined. That is, it doesn't follow the structure of the document.

Is this your intention?

If so, I don't see a way to achieve this in a reusable way, since the combination logic can't be derived from the source document.

If you want to get this result for this mapping you could do something like (untested):

[]
  rr:predicate bf:mainTitle ;
  rr:objectMap 
    [
      rml:reference "rdaw:P10086[3]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[1]/@xml:lang"
      ] ;
    ] , 
    [
      rml:reference "rdaw:P10086[2]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[2]/@xml:lang"
      ] ;
    ] , 
    [
      rml:reference "rdaw:P10086[4]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[3]/@xml:lang"
      ] ;
    ] , 
    [
      rml:reference "rdaw:P10086[1]" ;
      rml:languageMap [
        rml:reference "rdaw:P10086[4]/@xml:lang"
      ] ;
    ] ;
.

I guess you could also solve this using a function valued LanguageMap and have a function contain the logic to return the language you want, based on the input.

But, for that to work, we first need to be able to solve the more common challenge that is very close to what you describe here, or possibly, intended to.

That is, how to get the following output, which does follow the structure of the source.

@prefix bf: <http://id.loc.gov/ontologies/bibframe/>.

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:0 .

_:0 a bf:VariantTitle;
  bf:mainTitle "Lórax (Beber)"@pt, "Loraks"@af, "Driad"@ru, "Lórax"@es 

The general challenge here, is when it comes to combining multiple multi-valued expressions into a result of a term map.

Language maps are one example, but another example is an rr:template with multiple expressions, which each lead to multiple values.

The RML spec needs to provide clarity on how to handle these situations. This is issue #4.

For the purpose of this dicussion, let's assume that a cartesian product approach would be the default way of handling these cases. In that case we need something else to solve the case described in this issue.

I think we can look at xR2RML's nested term map (xrr:nestedTermMap) as a possible solution approach, which allows to basically add nested iterations within a term map. We would need to add language maps into the mix though.
I guess a language map should then only occur once per "tree" of (nested) term maps.

@frmichel
Copy link

Just a complement about xR2RML.

xR2RML has introduced the idea of an xrr:languageReference property of an object map.

About the multi-value question, xR2RML assumes that the evaluation of a reference (xrr:reference: rr:template, xrr:languageReference) can generate multiple values. Then, the term map generates RDF terms as the product of all the terms generated. So if you have this:

   xrr:reference "$.field";
   xrr:languageReference "$.lang";

and if the reference returns 2 turns and the languageReference returns 2 terms, then the term map will yield 4 RDF terms.

So this is basically "naturally" included in xR2RML, and that does not necessarily concern the xrr:nestedTermMap case.

Franck.

@pmaria
Copy link
Collaborator

pmaria commented Nov 23, 2020

@frmichel but in this case, you don't want to combine all terms, but only those that are grouped together in the source.

1: `["Lórax (Beber)"]`, and `["pt"]`,
2: `[""Loraks"]`, and `["af"]`,
3: `["Driad"]`, and `["ru"]`,
4: `["Lórax"]`, and `["es"]`.

So, is my understanding correct that you would use a xrr:nestedTermMap in this case?

@frmichel
Copy link

frmichel commented Nov 23, 2020

Oh ok, sorry I had not looked carefully enough.
I'm not so much at ease with XPath. Spontaneously I'd write it this way in xR2RML:

[]
  rr:predicate bf:mainTitle ;
  rr:objectMap [
      xrr:reference "rdaw:P10086/*" ;
      xrr:nestedTermMap [
        xrr:reference "/";
        xrr:languageReference "/@xml:lang";
  ] .

But I'm afraid that the "rdaw:P10086/*" with return the nodes values, and thus you will loose the language attribute. Such that "/@xml:lang" will not return anything.
One solution, somewhat complicated, could be to use the pushDown feature, but i'm not so confident actually.

I may have underestimated the differences between XPath and JSONPath actually...

@pmaria
Copy link
Collaborator

pmaria commented Nov 23, 2020

rdaw:P10086 will return

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="pt">Lórax (Beber)</rdaw:P10086>

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="af">Loraks</rdaw:P10086>

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="ru">Driad</rdaw:P10086>

<rdaw:P10086 xmlns:rdaw="http://rdaregistry.info/Elements/w/" xml:lang="es">Lórax</rdaw:P10086>

So

[]
  rr:predicate bf:mainTitle ;
  rr:objectMap [
      xrr:reference "rdaw:P10086" ;
      xrr:nestedTermMap [
        rml:reference ".";
        xrr:languageReference "@xml:lang";
  ] .

should work in this case.

@frmichel
Copy link

frmichel commented Nov 23, 2020

Ok, thx for the hint @pmaria . So after all yes, the concept of nestedTermMap could fill that need (I edited my mitaken example above to have xrr:reference instead of rml:language).

@pmaria
Copy link
Collaborator

pmaria commented Nov 23, 2020

Great! I think it is an elegant solution, since you can tackle arbitrarily nested structures.
I would combine it with the more generic rml:LanguageMap so you can use all expression types like reference, template, function etc.

@frmichel
Copy link

I would combine it with the more generic rml:LanguageMap so you can use all expression types like reference, template, function etc.

Yes I agree, the rml:LanguageMap is more generic. Cool.

@dachafra
Copy link
Member

@pmaria is this discussion already included (or there is a plan to include it ) in rml:LanguageMap? So we can close the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r2rml r2rml issues representation representation issues rml rml issues
Projects
None yet
Development

No branches or pull requests

5 participants