v2.1.0 #68

MBueschelberger · 2024-10-25T09:39:42Z

Previously, the mapping schema for individuals with custom relations was not very effective and very repetitive if an individual needs e.g. multiple dataproperties from a data file.

In order to produce a graph like this...

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <https://w3id.org/steel/ProcessOntology/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix chameo: <https://w3id.org/emmo/domain/characterisation-methodology/chameo#> .
@prefix nanoindentation: <https://w3id.org/emmo/domain/domain-nanoindentation/nanoindentation#> .

nanoindentation:John a chameo:Operator ;
    foaf:age 32 ;
    foaf:name "John"^^xsd:string ;
    ns1:hasLaboratory 345 .

nanoindentation:Jane a chameo:Operator ;
    foaf:age 28 ;
    foaf:name "Jane"^^xsd:string ;
    ns1:hasLaboratory 123 .

... mapping like this would have been needed to be applied:

[
      {
          "value_location": "data.name[0]",
          "value_relation": "http://xmlns.com/foaf/0.1/name",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator1",
      },
      {
          "value_location": "data.age[0]",
          "value_relation": "http://xlsns.com/foaf/0.1/age",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator1",
      },
      {
          "value_location": "data.lab_no[0]",
          "value_relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator1",
      },
      {
          "value_location": "data.name[1]",
          "value_relation": "http://xmlns.com/foaf/0.1/name",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator2",
      },
      {
          "value_location": "data.age[1]",
          "value_relation": "http://xlsns.com/foaf/0.1/age",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator2",
      },
      {
          "value_location": "data.lab_no[1]",
          "value_relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator2",
      },
  ]

... on a dataset shaped like this:

   {
    "data": [
        {
            "name": "Jane",
            "age": 28,
            "lab_no": 123,
        },
        {
            "name": "John",
            "age": 32,
            "lab_no": 345,
        },
    ]
}

However, with this PR, the schema can now be more simplified:

 [
    {
        "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
        "suffix": "name",
        "source": "data[*]",
        "suffix_from_location": True,
        "custom_relations": [
            {
                "object_location": "name",
                "relation": "http://xmlns.com/foaf/0.1/name",
            },
            {
                "object_location": "age",
                "relation": "http://xmlns.com/foaf/0.1/age",
            },
            {
                "object_location": "lab_no",
                "relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
            },
        ],
    }
]

Please note that the dataset now can have as many individuals as needed since we are able to apply a wildcard now (data[*]).
The suffix of the individual is also retrieved from the dataset once suffix_from_location is set to True. If set to False, simply the provided value from the suffix key will be taken.

If source is set, the object_location will be treated as a relative path of the root objects iterated from the data[*].

If source is not set, the object_location will be treated as absolute path. Same also applies for the suffix, when suffix_from_location is set to True.

See the updated docs here:
https://github.com/MI-FraunhoferIWM/data2rdf/blob/enh/mapping-for-multiple-individuals/docs/examples/abox/6_custom_relations.md

github-actions · 2024-10-25T09:47:24Z

Coverage Report

File	Stmts	Miss	Cover	Missing
data2rdf
__init__.py	5	0	100%
config.py	19	0	100%
utils.py	33	5	5	85%
warnings.py	2	0	100%
data2rdf/models
__init__.py	3	0	100%
base.py	47	4	4	91%
graph.py	150	35	35	77%
mapping.py	40	1	1	98%
data2rdf/modes
__init__.py	4	0	100%
data2rdf/parsers
__init__.py	6	0	100%
base.py	134	11	11	92%
csv.py	168	20	20	88%
excel.py	175	17	17	90%
json.py	188	29	29	85%
utils.py	79	11	11	86%
data2rdf/pipelines
__init__.py	2	0	100%
main.py	82	9	9	89%
data2rdf/qudt
__init__.py	0	0	100%
utils.py	42	12	12	71%
TOTAL	1179	154	87%

Tests	Skipped	Failures	Errors	Time
114	0 💤	0 ❌	0 🔥	2m 55s ⏱️

yoavnash · 2024-10-25T11:43:58Z

Seems to make sense for JSON but would that also work CSV or Excel files?
Is the old format still supported?

Kirankumaraswamy · 2024-10-25T11:44:17Z

Looks good to me. Does the changes also distinguishes if the object is going to be a literal or a URIREF object? For example if the data has an attribute hasOrganization and the value will be an IRI of a kitem.

MBueschelberger · 2024-10-25T12:14:30Z

Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?

It is also supported for Excel. However, the wildcard through source is not working there, since you cannot apply jsonpath to excel.

Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.

The old schema is still supported. The only difference is that if custom_relations is set, the other fields like value_location and value_relation, unit_location and unit_relation are disabled.

MBueschelberger · 2024-10-25T12:17:01Z

Looks good to me. Does the changes also distinguishes if the object is going to be a literal or a URIREF object? For example if the data has an attribute hasOrganization and the value will be an IRI of a kitem.

As already mentioned in the attached link to the docs above, you are able to set the xsd-type with the object_data_type field:

...
            {
                "object_location": "lab_no",
                "relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
                "object_data_type": "anyUri",
            },
...

yoavnash · 2024-10-25T13:05:10Z

Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?

It is also supported for Excel. However, the wildcard through source is not working there, since you cannot apply jsonpath to excel.

Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.

The old schema is still supported. The only difference is that if custom_relations is set, the other fields like value_location and value_relation, unit_location and unit_relation are disabled.

Is it then the case that data2rdf throws an error or a warning when a user tries it in a way that is not supported?

MBueschelberger · 2024-10-25T14:56:54Z

Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?

It is also supported for Excel. However, the wildcard through source is not working there, since you cannot apply jsonpath to excel.
Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.
The old schema is still supported. The only difference is that if custom_relations is set, the other fields like value_location and value_relation, unit_location and unit_relation are disabled.

Is it then the case that data2rdf throws an error or a warning when a user tries it in a way that is not supported?

Yes it does!

data2rdf/data2rdf/models/mapping.py

Line 116 in c24164d

def validate_model(cls, self: "ABoxBaseMapping") -> "ABoxBaseMapping":

Kirankumaraswamy

Although, I couldn't test it, the solution looks good to me.

MBueschelberger added 5 commits October 24, 2024 18:41

add custom relations into mapping

13e6fd7

add datatyping

29a0f0c

fix pytests

3c5edd4

add excel tests

6c73cf3

update docs

c24164d

MBueschelberger requested review from yoavnash and Kirankumaraswamy October 25, 2024 09:43

MBueschelberger self-assigned this Oct 25, 2024

MBueschelberger added the 📈 enhancement New feature or request label Oct 25, 2024

yoavnash approved these changes Oct 26, 2024

View reviewed changes

Kirankumaraswamy closed this Oct 28, 2024

Kirankumaraswamy reopened this Oct 28, 2024

Kirankumaraswamy approved these changes Oct 28, 2024

View reviewed changes

MBueschelberger merged commit 56d3bc7 into main Oct 28, 2024
15 checks passed

MBueschelberger deleted the enh/mapping-for-multiple-individuals branch October 28, 2024 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.1.0 #68

v2.1.0 #68

MBueschelberger commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

yoavnash commented Oct 25, 2024 •

edited

Loading

Kirankumaraswamy commented Oct 25, 2024

MBueschelberger commented Oct 25, 2024

MBueschelberger commented Oct 25, 2024

yoavnash commented Oct 25, 2024

MBueschelberger commented Oct 25, 2024

Kirankumaraswamy left a comment

v2.1.0 #68

v2.1.0 #68

Conversation

MBueschelberger commented Oct 25, 2024 • edited Loading

github-actions bot commented Oct 25, 2024 • edited Loading

yoavnash commented Oct 25, 2024 • edited Loading

Kirankumaraswamy commented Oct 25, 2024

MBueschelberger commented Oct 25, 2024

MBueschelberger commented Oct 25, 2024

yoavnash commented Oct 25, 2024

MBueschelberger commented Oct 25, 2024

Kirankumaraswamy left a comment

Choose a reason for hiding this comment

MBueschelberger commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

yoavnash commented Oct 25, 2024 •

edited

Loading