Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.1.0 #68

Merged
merged 5 commits into from
Oct 28, 2024
Merged

v2.1.0 #68

merged 5 commits into from
Oct 28, 2024

Conversation

MBueschelberger
Copy link
Member

@MBueschelberger MBueschelberger commented Oct 25, 2024

Previously, the mapping schema for individuals with custom relations was not very effective and very repetitive if an individual needs e.g. multiple dataproperties from a data file.

In order to produce a graph like this...

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <https://w3id.org/steel/ProcessOntology/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix chameo: <https://w3id.org/emmo/domain/characterisation-methodology/chameo#> .
@prefix nanoindentation: <https://w3id.org/emmo/domain/domain-nanoindentation/nanoindentation#> .

nanoindentation:John a chameo:Operator ;
    foaf:age 32 ;
    foaf:name "John"^^xsd:string ;
    ns1:hasLaboratory 345 .

nanoindentation:Jane a chameo:Operator ;
    foaf:age 28 ;
    foaf:name "Jane"^^xsd:string ;
    ns1:hasLaboratory 123 .

... mapping like this would have been needed to be applied:

[
      {
          "value_location": "data.name[0]",
          "value_relation": "http://xmlns.com/foaf/0.1/name",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator1",
      },
      {
          "value_location": "data.age[0]",
          "value_relation": "http://xlsns.com/foaf/0.1/age",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator1",
      },
      {
          "value_location": "data.lab_no[0]",
          "value_relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator1",
      },
      {
          "value_location": "data.name[1]",
          "value_relation": "http://xmlns.com/foaf/0.1/name",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator2",
      },
      {
          "value_location": "data.age[1]",
          "value_relation": "http://xlsns.com/foaf/0.1/age",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator2",
      },
      {
          "value_location": "data.lab_no[1]",
          "value_relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
          "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
          "suffix": "Operator2",
      },
  ]

... on a dataset shaped like this:

   {
    "data": [
        {
            "name": "Jane",
            "age": 28,
            "lab_no": 123,
        },
        {
            "name": "John",
            "age": 32,
            "lab_no": 345,
        },
    ]
}

However, with this PR, the schema can now be more simplified:

 [
    {
        "iri": "https://w3id.org/emmo/domain/characterisation-methodology/chameo#Operator",
        "suffix": "name",
        "source": "data[*]",
        "suffix_from_location": True,
        "custom_relations": [
            {
                "object_location": "name",
                "relation": "http://xmlns.com/foaf/0.1/name",
            },
            {
                "object_location": "age",
                "relation": "http://xmlns.com/foaf/0.1/age",
            },
            {
                "object_location": "lab_no",
                "relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
            },
        ],
    }
]   

Please note that the dataset now can have as many individuals as needed since we are able to apply a wildcard now (data[*]).
The suffix of the individual is also retrieved from the dataset once suffix_from_location is set to True. If set to False, simply the provided value from the suffix key will be taken.

If source is set, the object_location will be treated as a relative path of the root objects iterated from the data[*].

If source is not set, the object_location will be treated as absolute path. Same also applies for the suffix, when suffix_from_location is set to True.

See the updated docs here:
https://github.com/MI-FraunhoferIWM/data2rdf/blob/enh/mapping-for-multiple-individuals/docs/examples/abox/6_custom_relations.md

@MBueschelberger MBueschelberger self-assigned this Oct 25, 2024
@MBueschelberger MBueschelberger added the 📈 enhancement New feature or request label Oct 25, 2024
Copy link
Contributor

github-actions bot commented Oct 25, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
data2rdf
   __init__.py50100% 
   config.py190100% 
   utils.py3355 85%
   warnings.py20100% 
data2rdf/models
   __init__.py30100% 
   base.py4744 91%
   graph.py1503535 77%
   mapping.py4011 98%
data2rdf/modes
   __init__.py40100% 
data2rdf/parsers
   __init__.py60100% 
   base.py1341111 92%
   csv.py1682020 88%
   excel.py1751717 90%
   json.py1882929 85%
   utils.py791111 86%
data2rdf/pipelines
   __init__.py20100% 
   main.py8299 89%
data2rdf/qudt
   __init__.py00100% 
   utils.py421212 71%
TOTAL117915487% 

Tests Skipped Failures Errors Time
114 0 💤 0 ❌ 0 🔥 2m 55s ⏱️

@yoavnash
Copy link
Member

yoavnash commented Oct 25, 2024

Seems to make sense for JSON but would that also work CSV or Excel files?
Is the old format still supported?

@Kirankumaraswamy
Copy link
Member

Looks good to me. Does the changes also distinguishes if the object is going to be a literal or a URIREF object? For example if the data has an attribute hasOrganization and the value will be an IRI of a kitem.

@MBueschelberger
Copy link
Member Author

Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?

It is also supported for Excel. However, the wildcard through source is not working there, since you cannot apply jsonpath to excel.

Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.

The old schema is still supported. The only difference is that if custom_relations is set, the other fields like value_location and value_relation, unit_location and unit_relation are disabled.

@MBueschelberger
Copy link
Member Author

Looks good to me. Does the changes also distinguishes if the object is going to be a literal or a URIREF object? For example if the data has an attribute hasOrganization and the value will be an IRI of a kitem.

As already mentioned in the attached link to the docs above, you are able to set the xsd-type with the object_data_type field:

...
            {
                "object_location": "lab_no",
                "relation": "https://w3id.org/steel/ProcessOntology/hasLaboratory",
                "object_data_type": "anyUri",
            },
...

@yoavnash
Copy link
Member

Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?

It is also supported for Excel. However, the wildcard through source is not working there, since you cannot apply jsonpath to excel.

Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.

The old schema is still supported. The only difference is that if custom_relations is set, the other fields like value_location and value_relation, unit_location and unit_relation are disabled.

Is it then the case that data2rdf throws an error or a warning when a user tries it in a way that is not supported?

@MBueschelberger
Copy link
Member Author

Seems to make sense for JSON but would that also work CSV or Excel files? Is the old format still supported?

It is also supported for Excel. However, the wildcard through source is not working there, since you cannot apply jsonpath to excel.
Implementing it for CSV is a bit more complicated since the overall parser works differently. Hence CSV is currently not supported.
The old schema is still supported. The only difference is that if custom_relations is set, the other fields like value_location and value_relation, unit_location and unit_relation are disabled.

Is it then the case that data2rdf throws an error or a warning when a user tries it in a way that is not supported?

Yes it does!

def validate_model(cls, self: "ABoxBaseMapping") -> "ABoxBaseMapping":

Copy link
Member

@Kirankumaraswamy Kirankumaraswamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, I couldn't test it, the solution looks good to me.

@MBueschelberger MBueschelberger merged commit 56d3bc7 into main Oct 28, 2024
15 checks passed
@MBueschelberger MBueschelberger deleted the enh/mapping-for-multiple-individuals branch October 28, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants