Skip to content

Implement RDF Dataset Canonicalization (RDFC-1.0) with Canonical N-Quads Output #3461

@kishorebanala

Description

@kishorebanala

Version

jena-5.5.0

Feature

Description:

Implement W3C RDF Dataset Canonicalization (RDFC-1.0) algorithm in Apache Jena with output in canonical N-Quads format. This enables deterministic serialization of RDF datasets by assigning canonical identifiers to blank nodes.

References:

Tasks:

  • Create NQuadsCanonicalWriter class extending WriterDatasetRIOTBase
  • Add NQUADS_CANONICAL format constant to RDFFormat
  • Register canonical writer factory in RDFWriterRegistry
  • Implement RDFC10Canonicalizer with complete RDFC-1.0 algorithm
    • Create HashUtils for SHA-256 hash computations and lexicographic sorting
    • Implement CanonicalIssuer for _c14n_N blank node identifier assignment
    • Add DatasetProcessor for blank node extraction and dataset processing
  • Download and integrate W3C canonicalization test suite to jena-arq/testing/rdf12-wg/rdf-n-quads-c14n/
  • Update Scripts_RIOT_c14n.java test factory following existing RIOT patterns
  • Implement RDFCanonicalizationTest for algorithm validation leveraging https://w3c.github.io/rdf-canon/tests/
  • Add writeCanonical() and canonicalizeDataset() methods to RDFDataMgr
  • Add --canonical flag support to riot command line tool
  • Update documentation and create usage examples

Are you interested in contributing a solution yourself?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIncrementally add new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions