Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which SHACL validators to try? #95

Open
VladimirAlexiev opened this issue Sep 26, 2024 · 17 comments
Open

which SHACL validators to try? #95

VladimirAlexiev opened this issue Sep 26, 2024 · 17 comments
Assignees
Labels
shacl Pertains to SHACL shapes

Comments

@VladimirAlexiev
Copy link
Collaborator

VladimirAlexiev commented Sep 26, 2024

Requirements:

Let's define which validation engines to try.
We now have a pretty complete list at https://github.com/w3c-cg/awesome-semantic-shapes#readme and several contributors add to it often.

Here's a proposal:

  • TopQuadrant API (Java & Jena) because it has been tested to date
    • ValiMate (part of the DNV suite) uses TopQuadrant API, so doesn't need to be tested separately
  • Jena SHACL (Java) because it seems to have more recent development than TQ
  • rdf4j ShaclSail (Java) because it can do incremental validation, and is used by GraphDB: see supported SHACL features
  • pySHACL (Python) because a lot of TSOs/Power Engineers use Python, claims to have the most complete coverage of SHACL features, used by ModShape
  • DataTreehouse/maplib (Rust) because it's fast for in-memory. However, it's not open source and DataTreehouse didn't agree to provide it for evaluation (Oct 2024)
  • https://rudof-project.github.io/rudof/: a library and CLI that implements Shape Expressions, SHACL, DCTAP and other technologies in the RDF ecosystem (by the same guy who made SHACLEX, the first SHACL+SHEX implementation). It is implemented in Rust and it also provides Python bindings.

How about:

  • JavaScript implementations?
  • Shaclex, which implements SHACL and SHEX validation (don't know if that's relevant)

After we agree on the list, we need to research and list the limitations of every implementation.
This may eliminate some candidates.

@HarisVranaj please attach the presentation you showed 2d ago (I hope it's not confidential).
@griddigit-ci and @Sveino please comment on the proposal above, and I'll correct the list

@VladimirAlexiev VladimirAlexiev added the shacl Pertains to SHACL shapes label Sep 26, 2024
@Sveino
Copy link
Owner

Sveino commented Sep 27, 2024

I agree on the requirement. First of all, we would like to have the UML/information model so that we really only need UML restriction. However, the world is more complicated. To avoid to have very technical UML/information model we will use a logical description of the constraints. This does not really need to be processes as is, but can be converted to relevant execution. This should be the primary motivation for not including SPARQL. Secondary is that we want to have engines that is optimised to execute well known constraints pattern.
So our primary test of the SHACL validation engines is to test our SHACL that we are applying rules that are not wrong understanding or bias to a particularly implementation.
I agree with the priorities and the argument for picking them. If we should add any addition, I would considered pySHACL. The reason for this is that a lot of TSOs are start using Python for Power Engineers. In addition Nick Car is a core developer. They have also boosted that they have the most complete coverage of the SHACL rules.

@HarisVranaj
Copy link
Collaborator

HarisVranaj commented Oct 2, 2024

I have some suggestion from Erik for benchmarks in SHACL/SPARQL validator.

https://github.com/oxigraph/oxigraph/blob/main/bench/README.md
https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines
Oxigraph is now optimised for memory usage (no longer using the rocksdb engine when using in memory) which on Erik's machine is 4 times faster then earlier versions (as this is including unzipping the file, real performance will even be better).

@VladimirAlexiev
Copy link
Collaborator Author

@HarisVranaj but do Oxigraph and QLever have SHACL implementations?
Please post links so I can include in https://github.com/VladimirAlexiev/awesome-semantic-shapes#shacl-validators and thereon to https://github.com/w3c-cg/awesome-semantic-shapes

@VladimirAlexiev
Copy link
Collaborator Author

Note to self: https://mail.google.com/mail/u/0/#sent/QgrcJHsTgsbXhdCJwNqzTbwQHVhdRXDHtBB asked Treehouse for access to maplib SHACL.

@VladimirAlexiev
Copy link
Collaborator Author

VladimirAlexiev commented Oct 25, 2024

@Sveino points out that rdf4j 5.0.0 and 5.0.3 have some SHACL improvements:

And more are planned to be completed by 5.0.3 is released

GraphDB will upgrade to rdf4j 5.0 at the end of the year.

@griddigit-ci
Copy link
Collaborator

When I tried pySHACL back in Jan and tried to package ModShape I has troubles. I was having performance issues. I was in touch with Nick at that time, there might be solutions, but I didn't have time to clean that up.

@HarisVranaj
Copy link
Collaborator

@VladimirAlexiev
Copy link
Collaborator Author

@HarisVranaj Do Qlever and OxiGraph support SHACL? Please post links to documentation

@hmottestad
Copy link

I'm also working on supporting the last of the SHACL path expressions, and this should be included in RDF4J 5.1.0 or 5.2.0: eclipse-rdf4j/rdf4j#5131

I can also advertise that the RDF4J SHACL implementation supports incremental validation. If you have a large database and want to make a small change to your data, then the RDF4J SHACL engine will analyse your changes and only validate the affected target nodes.

@Sveino
Copy link
Owner

Sveino commented Nov 7, 2024

@hmottestad Very good. Incremental or difference validation is extremely relevant since we have a lot of SHACL rules that goes across multiple objects. The full graph is getting very big, and the changes are very limited. We have included the possiblity to exchange differences since 2005 using CIMXML/ RDFXML. We are not looking into how we can use JSON-LD to exchange this. See #53

@hmottestad
Copy link

RDF4J 5 has support for JSON-LD 1.1 with a customised version of Titanium JSON-LD that is considerably faster than the stock implementation that Jena is using.

I saw you were talking about DCAT, is your projected related to Datakatalogen på any chance?

@Sveino
Copy link
Owner

Sveino commented Nov 8, 2024

I like fast code :-)
The use of DCAT has two purposes. One is providing the header information on the dataset/named graph. We expect the same information to be linked to a Catalog so that the dataset/named graph can be found. So second purpose is to support data catalog (Datakatalog).

@HarisVranaj
Copy link
Collaborator

Reply from Erik. "They do not support it out of the box, only SPARQL, for them, SHACL needs to be translated into SPARQL
this one does
https://github.com/DataTreehouse/maplib
, they say python, but it's actually written in Rust with a Python API, but can be used as basis to create full app in Rust. "
@Sveino can you give access to him.

@Sveino
Copy link
Owner

Sveino commented Nov 8, 2024

@HarisVranaj I am not able to give access to DataTreehouse Github, if that is what you wanted.

@HarisVranaj
Copy link
Collaborator

nono to this repository.

@VladimirAlexiev
Copy link
Collaborator Author

The repo is public, so Erik can post and comment

@VladimirAlexiev
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
shacl Pertains to SHACL shapes
Projects
None yet
Development

No branches or pull requests

5 participants