Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RDF Dataset Canonicalization support #71

Closed
filip26 opened this issue Jan 30, 2024 · 6 comments
Closed

Add RDF Dataset Canonicalization support #71

filip26 opened this issue Jan 30, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@filip26
Copy link
Owner

filip26 commented Jan 30, 2024

e.g. a new command canonicalize taking RDF as an input and producing canonicalized RDF as an output

@filip26 filip26 added the enhancement New feature or request label Mar 9, 2024
@vorburger
Copy link

Just FYI, I've built something a bit like this in https://docs.enola.dev/use/canonicalize/, into this, using an RFC 8785 JSON Canonicalization Scheme (JCS) -inspired (but currently not fully compliant) algorithm.

@filip26
Copy link
Owner Author

filip26 commented Feb 20, 2025

@vorburger Please, out of curiosity, what’s the motivation for the canonicalization on top of the expanded JSON-LD form?

I’m asking because I’m experimenting with something similar to explore whether there could be a simpler and safer way to canonicalize JSON-LD without needing to go down to the RDF level or staying too high with JCS.

@filip26
Copy link
Owner Author

filip26 commented Feb 20, 2025

Let's take this example:

{
  "@context": "http://schema.org/",
  "ref1": {
    "@id": "http://example.com/doe",
    "@type": "Person",
    "name": "Jane Doe"
  },
  "ref2": {
    "@id": "http://example.com/doe",
    "jobTitle": "Professor"
  }
}

If you use plain JCS on top of the expanded form, it won't match this:

{
  "@context": "http://schema.org/",
  "ref1": {
    "@id": "http://example.com/doe",
    "@type": "Person",
    "name": "Jane Doe",
    "jobTitle": "Professor"
  },
  "ref2": {
    "@id": "http://example.com/doe"
  }
}

But from an RDF perspective, they are the same. RDFC will produce the same result for both examples.

<http://example.com/doe> <http://schema.org/jobTitle> "Professor" .
<http://example.com/doe> <http://schema.org/name> "Jane Doe" .
<http://example.com/doe> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
_:c14n0 <http://schema.org/ref1> <http://example.com/doe> .
_:c14n0 <http://schema.org/ref2> <http://example.com/doe> .

This makes me think there might be another JSON-LD form - something between expanded and flattened. One that extracts and merges identifiable nodes while keeping blank nodes embedded as they are, preserving the tree-like structure in both cases without needing to disintegrate the entire tree to the statement level.

I wonder if anyone would be interested in something like that.

@vorburger
Copy link

vorburger commented Feb 22, 2025

@vorburger Please, out of curiosity, what’s the motivation for the canonicalization on top of the expanded JSON-LD form?

Hi! To be totally honest, my initial motivation for "hacking" (my) RdfCanonicalizer [which now that I looked at it again for this seems to have had an obvious bug; fixed with https://github.com/enola-dev/enola/pull/1104 ] was simply that I just wanted (needed) to use (something like) it in ModelSubject (which is a sort of "Matcher" for unit testing). The enola canonicalize CLI was just as "side effect" to "externally expose" this helper, for fun. In the future it might also be used for (H)MAC hashing for "security" related ideas.

This makes me think there might be another JSON-LD form - something between expanded and flattened. One that extracts and merges identifiable nodes while keeping blank nodes embedded as they are, preserving the tree-like structure in both cases without needing to disintegrate the entire tree to the statement level.

I only vaguely understand what you mean... but sounds interesting. But if I were you and wanted to pursue this, I would probably start a discussion about it... on https://github.com/w3c/rdf-canon/issues, might be a good place, according to https://w3c.github.io/rch-wg-charter/#communication?

I wonder if anyone would be interested in something like that.

I don't think I currently would have a need for it.

PS: TBD just FYI enola-dev/enola#1103.

@filip26
Copy link
Owner Author

filip26 commented Feb 22, 2025

I would probably start a discussion about it... on https://github.com/w3c/rdf-canon/issues, might be a good place, according to https://w3c.github.io/rch-wg-charter/#communication?

Thank you for your answer. The motivation for canonicalization over JSON-LD, while maintaining the same level of granularity, could help mitigate key issues such as:

  • Blank node assignment – Challenging when using RDFC, but intrinsic to a tree structure.
  • Graph isomorphism – A generally NP-hard problem.

These issues can lead to potentially expensive computations and risks such as graph poisoning.

The intermediate form I envision is somewhere between JCS and RDFC - faster and safer than RDFC while remaining at the semantic level like RDFC, rather than purely syntactic like JCS. It is specifically designed for JSON-LD and tree-like structures , which is another limitation, causing zero interest from the RDF community.

From the feedback I’ve received, there seems to be interest in this approach (a canonical form is crucial for signing, verifiable credentials, etc.), but only if someone is willing to put in the effort to make it a standard. 😉

filip26 added a commit that referenced this issue Mar 10, 2025
@filip26
Copy link
Owner Author

filip26 commented Mar 10, 2025

Released! Check out v0.10.0.

@filip26 filip26 closed this as completed Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants