-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Property Shapes duplicates in SHACL Shapes in 2.0 release #19
Comments
Well, validator accepts the spec and validates something.
|
The shapes are automatically generated via a publication system that is used by Semic, to be checked at overall but I will keep an eye on this to see if it improves, thanks for reporting. The fact that for a property it is splitted on multiple property shapes it is by design, to be modular. Thanks again, any feedback on the model? |
What kind of feedback are you interested in and what type of issues are your willing to address at this lifcycle stage of the spec? At the first stage of the spec consumption we are dealing with a bunch of minor issues (i.e. the first type of issues). For example, another issue is "Different prefixes for the same dc/terms namespace in DCAT 3 and in MLDCAT-AP profile, also conflicting with the established common practice". # in DCAT 3 ontology
@prefix dcterms: <http://purl.org/dc/terms/> .
# in MLDCAT-AP SHACL
@prefix dc: <http://purl.org/dc/terms/> . The issue looks similar to this issue #7 by @VladimirAlexiev. According to the prefix lookup service https://prefix.cc
To summarize: The "dc" prefix should be changed to "dcterms" in MLDCAT-AP for the following reasons:
The better (but it seems way more disruptive) way could be to switch the overall DCAT stack (DCAT ontology, all AP profiles) from "dc" and "dcterms" to "dct" prefix. Should we report this kind of issues here? Or it is better to address here only bigger ones (missing properties/classes)? |
We "un-modularized" property shapes, made the SHACL shapes more "human-friendly" (at least we hope so :) ) and fixed some our issues/struggles in our fork repository https://github.com/agentlab/MLDCAT-AP while trying to stay compatible as much as possible with the spec. |
@amivanoff your comment and your resolution shows one of the main challenges for SHACL artefacts. The formulation you created has a number of considerations:
But the most challeging aspect is maintenance and compleneteness. Our generators can generate a variant of your suggestion but because of 1 we switched. For the usage of validation (use the file as-is in a shacl engine) the condensed or splitted version has no impact. DCAT-AP devotes a whole section on validation (https://semiceu.github.io/DCAT-AP/releases/3.0.0/#validation-of-dcat-ap). You will see that there a human managed collection of shapes is added. Those are manually maintained because they target, as explained there, various validation situations. In principle, a large part of those could be done by referring to the generated ones (a first approach for that use is findable in https://semiceu.github.io/DCAT-AP/releases/3.0.0-hvd/#validation). This brings us to the main advantage for the DCAT-AP ecosystem is that a collection of named individual constraints allows to relate requirements (and in this case the SHACL formulation of the requirement) to interlink. The DCAT-AP profile of the Swedisch Geocatalogue can refer directly to a requirement in the SHACL. It makes comparisons in that way easier and decisions more transparant. Towards interlinked specifications. Of-course improvements can be made, and some of your and other comments on the SHACL indicate issues, and will be resolved over time. But in this ecosystem of overlapping use of the same data with respect to different requirements we can take benefit of the power of linked data to offer additional services. As a last note: the SHACL of the SEMIC specifications is a consequence of what is written in the HTML. Not vice versa. It only reflects the constraints that easily can be written in SHACL. I hope this answer provides you some insights on the why of the taken approach. |
Well, today different shapes representations with gifferent goals in ming are technically allowed and even advised by some people. |
@bertvannuffelen your reason 1 is false: the examples given in the description do NOT describe the individual error. Two more defects:
|
@VladimirAlexiev, if you divide "one-propertyshape-for-one-property" into several "smaller-propertyshapes" with only one constraint in each of this "smaller-propertyshapes", then the I am seen such an approach to the SHACL validation for the first time in years. But its working 😊 Example: 1 One "one-propertyshape-for-one-property" for the <#CatalogShape/publisher> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:nodeKind shacl:BlankNodeOrIRI; #not working here
shacl:class foaf:Agent;
shacl:minCount 1;
shacl:maxCount 1;
shacl:message "All publisher's constraints are wrong"@en . 2 Several "smaller-propertyshapes" the <#CatalogShape/93f73e69bb03d2928fcf758a253ef316becdf9b9> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:nodeKind shacl:BlankNodeOrIRI; #will work only if you disable the shacl:class rule b3ec0655204c62a2531244aaeab12f1a2c5e5b5d
shacl:message "Only publisher's nodeKind constraint is wrong"@en .
<#CatalogShape/b3ec0655204c62a2531244aaeab12f1a2c5e5b5d> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher";
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:class foaf:Agent;
shacl:message "Only publisher's class constraint is wrong"@en .
<https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#CatalogShape/a0ccdf3bd7f5d161d07f375a26e68c18ca91dc19> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:minCount 1;
shacl:message "Only publisher's minCount constraint is wrong"@en .
<#CatalogShape/67dcdb36167ca7969c0532898e11a98e9c2a80f5> rdfs:seeAlso "https://semiceu.github.io/MLDCAT-AP/releases/2.0.0#Catalogue.publisher";
shacl:name "publisher"@en;
shacl:description "An entity (organisation) responsible for making the Catalogue available."@en;
shacl:path dc:publisher;
shacl:maxCount 1;
shacl:message "Only publisher's maxCount constraint is wrong"@en .
Yes, seeAlso as string -- this is definitely a bug. |
To me it looks like "technology abusing" but yeah, "it's working"... And I could not see any other way to do a granular internationalizable error messages for each constraint of a property shape on each of all EU languages... Besides deep internationalization of the Jena SHACL Validator internal mechanics (or RDF4J, or any other open source SHACL validator). But I presume, Jena's maintainers would not be happy to make Jena speaks another 23 languages besides English. Do not know what @HolgerKnublauch thinks about this "language attack" on shapes... |
It seems, the SEMIC guys takes localization/internationalization of validator error reports VERY seriously. They want to provide as much as possible of validator report to the user (i.e. integration specialist?) on a local language.
|
Maybe it is better to call it "a bunch of property constraints", not a "property shape". Because in this case "property shape" is not specified explicitly in spec in it's complete form. It is not reifyable/addressable (no IRI). Property shape is constructed by validator in runtime as a conjunction of a class shape and a |
In the current SHACL version it is indeed required to define a separate shape whenever you want to specify a different message. With the upcoming 1.2 I hope we can generalize this so that reification can be used to attach message (and severity and possibly more) to each constraint triple. That should help here. |
This is not abuse. This is a way to make individual checks more atomic, thus easier to generate.
The spec says that:
@amivanoff Do you see any spec change required for multilingual translations?
Yes, but only if they say different things. Multiple shapes are not needed to accommodate multiple translations. |
@VladimirAlexiev, yes, one shape per constraint, and this shape contains multiple translations for the This issue enables a capability to preserve property shape reification (if authors will be willing to). But one aspect still stands unhandled in this issue It is related to the "message substitution" semantics. In "message substitution/redefinition" example from the issue above it could be only one of two cases:
In case 2, we lose detailed information from a validator "but found 2" (i.e., how many constraint violations for this object-propertyshape have been found). So with "error message substitution" we could translate general messages only which do not take into account specific data situation. In the DCAT-AP SHACL profile colleagues tried to dump In the released DCAT 3.0 version they did not use any of All of it raises a question of I could not grasp if this issue could help with all above |
exactly, but also
All these have to do with use-cases of designing a business UI for an Validation service where the result is guiding the user to the most important issues to resolve. Today validators like https://www.itb.ec.europa.eu/shacl/dcat-ap/upload produce a technical table Error - message - relatedValue. And then the hunt is on. One has to be an RDF expert to find the source (which is in most cases trivial for an RDF expert) but the resolvement is harder. To illustrate the above the following 3 values are licences found in an open data portal (value of dct:license). The first is acceptable but the second and probably also the 3 not.
Being able to cross-reference to https://semiceu.github.io/DCAT-AP/releases/3.0.0-hvd/#c3 in case of HVD compliance is a valuable motivation for publishers to get at least rid of the second, but likely also for the third. That is different from the validator does not like it. With such a cross reference the RDF expert can more easily motivate the dataset owner (some publisher in some agency) to adapt its source metadata. SHACL is in our context also a mean to provide service to non-technical RDF staff. |
@amivanoff I assume a standard validator produced "but found 2", and it speaks only "en". To be able to parameterize this for "nl" i added the following in the above issue:
|
Property shapes with the same
shacl:path
and different generated IRIs repeats twice or sometimes even 3-4 times in the Turtle spec. JSON-LD affected also.For example, several property shapes repeats just for the
CatalogShape
class shape;Just one concrete example for the
foaf:homepage
property shape (lang tag stripped):Some times property some shape variants misses cardinality restriction. Some times it differs with
shacl:nodeKind shacl:BlankNodeOrIRI
orshacl:class
.If it needs a variability in value restrictions (BlankNodeOrIRI or concrete class), the correct way is to use
sh:or
, I think.If property shape's IRIs weren't random, this would be a minor problem 😊 But as it is, it seems it's an error. And an "adoption blocker" one.
The text was updated successfully, but these errors were encountered: