Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse XML with namespace #132

Closed
florent-andre opened this issue Oct 6, 2021 · 10 comments
Closed

Parse XML with namespace #132

florent-andre opened this issue Oct 6, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@florent-andre
Copy link

florent-andre commented Oct 6, 2021

Hello,
First, thanks for this promising tool set. And I hope I send the question on the good canal and repository.

I try to map an xml with namespaces for nodes (an xsd type file).
When I remove the namespaces from my source file, the test triples are generated.
But when I restaure namespace in the xml file and add xsd: ns to xpath, I get an empty set of triples.

As I find no example of "xml with namespace" parsing, I ask myself how I can do that.

Here is the example I try to tackle, this can be added to mattey.
Thanks for you help,
regards

  • source xml file with ns
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hropen="https://hropenstandards.org/schema/xml/" targetNamespace="https://hropenstandards.org/schema/xml/" version="4.3.0" id="PersonType" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:include schemaLocation="PersonPhysicalInclusion.xsd" />
  <xsd:include schemaLocation="PersonLegalInclusion.xsd" />
  <xsd:include schemaLocation="../profile/PersonProfileInclusion.xsd" />
  <xsd:include schemaLocation="PersonBaseType.xsd" />
  <xsd:complexType name="PersonType">
    <xsd:annotation>
      <xsd:documentation>A schema that represents all of the information of a person.</xsd:documentation>
    </xsd:annotation>
    <xsd:all>
      <xsd:element minOccurs="0" maxOccurs="1" name="legalId" type="hropen:IdentifierType">
        <xsd:annotation>
          <xsd:documentation>The legal identifier of a person. The issuer is most likely a country or state.</xsd:documentation>
        </xsd:annotation>
      </xsd:element>
    </xsd:all>
  </xsd:complexType>
</xsd:schema>
  • yarrrml configuration
prefixes:
  rml: 'http://semweb.mmlab.be/ns/rml#'
  rr: 'http://www.w3.org/ns/r2rml#'
  ql: 'http://semweb.mmlab.be/ns/ql#'
  rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  "": 'http://example.org/rules/'
  schema: 'http://schema.org/'
  dbo: 'http://dbpedia.org/ontology/'
  test: 'https://test.fr/mm'
  fno: 'https://w3id.org/function/ontology#'
  fnoi: 'https://w3id.org/function/vocabulary/implementation#'
  fnom: 'https://w3id.org/function/vocabulary/mapping#'
  fnml: 'http://semweb.mmlab.be/ns/fnml#'
  grel: 'http://users.ugent.be/~bjdmeest/function/grel.ttl#'
  xsd: 'http://www.w3.org/2001/XMLSchema'
mappings:
  mapping0:
    sources:
      - [Supergirl.xml~xpath, /xsd:schema/xsd:complexType/xsd:all/xsd:element]
    s: 'http://example.org/character/$(@name)'
    po:
      - [a, 'schema:PersonTEST~iri']
      - ['schema:TARGET', $(@type)]
      - ['schema:DESCR', $(xsd:annotation/xsd:documentation)]
      - ['dbo:SHACL', $(@minOccurs)]
@DylanVanAssche
Copy link
Contributor

Hi!
Thanks for reaching and using our tools!

Unfortunately, this is a long standing issue we haven't been able to properly resolve.
In the predecessor of the rmlmapper-java, we had 2 issues about this:

but without a proper resolution.
If you have any feedback on how to resolve this properly in the mapping rules, using a CLI parameter, etc. feel free to comment below! We would love to have some feedback on this.

@DylanVanAssche DylanVanAssche added the enhancement New feature or request label Oct 7, 2021
@florent-andre
Copy link
Author

Humm... maybe extract the source's xmlns and reuse them in xpath call ?
This require a well formated xml. But it's the minimum...
I don't know how the xpath interpreter is configurable, but passing the source's xmlns should be doable.

I think it's better "mapping man" experience than the declarative way of the Carmel implementation seems to do this :

carml:declaresNamespace [
        carml:namespacePrefix "edxl-cap" ;
        carml:namespaceName "http://release.niem.gov/niem/adapters/edxl-cap/3.0/" ;

Linked to kg-construct/rml-fno-spec#9

@DylanVanAssche
Copy link
Contributor

Humm... maybe extract the source's xmlns and reuse them in xpath call ?

That might be a possibility to workaround this problem, we always welcome any PRs to help out!

In the meantime, I brought this to the attention of the W3C Community Group working around RML and other mapping language to have a standard like R2RML for transforming heterogeneous data into RDF, see kg-construct/rml-io#4

@florent-andre
Copy link
Author

Hi,
I can try to have a look, but java is a long time souvenir, and any guidance on the class involved will be appreciated.

@DylanVanAssche
Copy link
Contributor

Hi @florent-andre
Sure! Happy to assist you :)
To extend the XPath extractor, you probably want to look at getDocumentFromStream method of XMLRecordFactory.
There you can configure the DocumentBuilderFactory.

You can also read the InputStream argument there already to look for XML namespaces.

@florent-andre
Copy link
Author

@DylanVanAssche please find a PR for solving namespaced xpath.

Please note, that it fix work for full namespaced tree.
If the xml mix namespaced and not, this should be explored. See this document for detail about this: "even the default namespace is a namespace, and thus matching names have to be prefixed in XPath".

Another remark:
What do you think about creating an xPathSingleton to provide the xPath object and not create multiple instances of it in XMLRecord and XMLRecordFactory:

XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new NamespaceResolver(document));

@DylanVanAssche
Copy link
Contributor

@florent-andre Thanks for the PR! I will have a look next week :)

If the xml mix namespaced and not, this should be explored. See this document for detail about this: "even the default namespace is a namespace, and thus matching names have to be prefixed in XPath".

I'm not that familiar with XML namespaces, but I think this is PR is a good start in general, we can just mention it with a TODO comment that this case is not explored.

What do you think about creating an xPathSingleton to provide the xPath object and not create multiple instances of it in XMLRecord and XMLRecordFactory:

That would actually be better I think... Feel free to try it :)
As long as the test cases still pass after this change it is fine.

@pheyvaer
Copy link
Collaborator

pheyvaer commented Oct 18, 2021

Regarding avoiding creating multiple xPath objects, I would strongly advice against using the Singleton pattern, especially because it complicates testing.

@florent-andre
Copy link
Author

Get your point about Singleton.
The actual PR don't implement Singleton and "nondependants tests" pass.

@florent-andre
Copy link
Author

As this PR was merged, I close this issue.
Thanks guys for building and maintaining this lib !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants