Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch of dct:format property range's type with the provided vocabulary #403

Closed
Joonas-M-S opened this issue Nov 27, 2024 · 4 comments
Closed

Comments

@Joonas-M-S
Copy link

The property dct:format has a range of dct:MediaTypeOrExtent and a controlled vocabulary is used for it. The following, partial example of an instance of a dcat:Distribution, shows the usage of the vocabulary:

[
     a dcat:Distribution ;
     dct:format <http://publications.europa.eu/resource/authority/file-type/XML>
]

but the SHACL file expects the instance of dct:MediaTypeOrExtent to be of type dct:MediaTypeOrExtent which http://publications.europa.eu/resource/authority/file-type/XML is not. The type is skos:Concept and euvoc:FileType. Is there a problem with the vocabulary or have I missed something?

@Daham-Mustaf
Copy link

Hi @Joonas-M-S

The issue stems from a discrepancy between the DCAT-AP specification and real-world usage of the EU Publications Office File Type vocabulary.

According to the DCAT-AP 2.1.1 specification, dct:format should have a range of dct:MediaTypeOrExtent. However, the EU Publications Office File Type vocabulary (http://publications.europa.eu/resource/authority/file-type) uses skos:Concept as the type for its terms, which is a common practice for controlled vocabularies.

the SHACL shape sould be updtaed to accommodate both types using shacl:or:

# line 2064 
<https://semiceu.github.io//DCAT-AP/releases/3.0.0#DistributionShape/737cd5f1c9f0e13c35eb82b7f0d2b2f76a9a82c7> 
    rdfs:seeAlso "https://semiceu.github.io//DCAT-AP/releases/3.0.0#Distribution.format";
    shacl:or (
        [ shacl:class dc:MediaTypeOrExtent ]
        [ shacl:class skos:Concept ]
    );
    shacl:description "The file format of the Distribution."@en;
    shacl:name "format"@en;
    shacl:path dc:format;
    shacl:message "The format must be either a MediaTypeOrExtent or a SKOS concept"@en .

This change allows validation of both:

  1. Traditional MIME types as instances of dct:MediaTypeOrExtent
  2. Terms from controlled vocabularies like the EU Publications Office File Type vocabulary as instances of skos:Concept

Here's a valid example using the EU Publications Office File Type vocabulary:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://publications.europa.eu/resource/authority/file-type/XML> 
    a skos:Concept ;
    skos:prefLabel "XML"@en .

<http://example.org/distribution/2>
    a dcat:Distribution ;
    dc:format <http://publications.europa.eu/resource/authority/file-type/XML> .

This solution maintains backward compatibility while supporting the widely-used EU vocabularies in DCAT-AP implementations.

@init-dcat-ap-de
Copy link

It's a common problem with converting everything from an ontology into a shacl-shape.
You have a similar problem when you use e.g. foaf:homepage: the shacl shape says the object has to be a foaf:Document, but most webpages will not self-identify as foaf:Document.

For the German SHACL-Shapes for DCAT-AP.de 3.0 we deactivate 55 of those rules: https://github.com/GovDataOfficial/DCAT-AP.de-SHACL-Validation/blob/master/validator/resources/v3.0/shapes/dcat-ap-SHACL-DE.ttl

@bertvannuffelen
Copy link
Contributor

@Joonas-M-S. The responses of @Daham-Mustaf and @init-dcat-ap-de are correct and contain some of the actions you must perform as implementer.
A similar question has been raised in #400.

DCAT-AP tries to reuse as much as possible existing agreements from other vocabularies and application profiles.
At some moment in the modeling the decision will be taken that further detailing the referenced entity is beyond scope of DCAT-AP. And this is such a case.

In this case, we ensured some harmonisation (e.g. by selecting a controlled vocabulary) and agreed that this choice is a valid one even if the last "technical bits" how the property is defined and how the controlled vocabulary is defined are not technically perfectly fitting.

In this case one can use the magic want of the semantical modeling: inference in an open world. One derives that a <http://publications.europa.eu/resource/authority/file-type/XML> is a dct:MediaTypeOrExtent . This statement is not that wrong. So we can accept this choice and technical mismatch.

Of-course we work towards making the gap created in the specification DCAT-AP to be as minimum as possible.
But we have to realise, we reuse here information from others and thus it is hard to impose our DCAT-AP rules on external communities. We can only ask them to make the bridge as easy as possible.

@Joonas-M-S
Copy link
Author

Thank you for the answers. These clarified the issue for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants