Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruuts-c04-metadata #57

Merged
merged 8 commits into from
Aug 14, 2023
Merged

ruuts-c04-metadata #57

merged 8 commits into from
Aug 14, 2023

Conversation

S4mmyb
Copy link
Member

@S4mmyb S4mmyb commented Jun 15, 2023

Description

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items
.

I have...

  • confirmed all author checklist items have been addressed
  • reviewed code correctness and readability
  • reviewed documentation is accurate
  • manually tested (if applicable)

@S4mmyb S4mmyb requested review from clevinson, blushi and wgwz June 15, 2023 19:46
@S4mmyb
Copy link
Member Author

S4mmyb commented Jun 19, 2023

@wgwz @blushi this PR contains (or will contain) all the metadata for the Ruuts credit class, pilot project, and credit batch. So far I've only finished the Credit Class metadata, so a JSON-LD template, a JSON-LD ops file filled out with the appropriate info, and a new SHACL schema. If you can review what I have so far that would be great!

In creating all of this I had a few questions / comments:

  • In the JSON-LD files we use the @context section to define certain fields explicitly as lists and I'm wondering if this is actually needed or if it's redundant. As far as I understand defining expected data types/structures, like regen:ecosystemType, is something we do in the SHACL schema, meaning that in the SHACL schema we would say that regen:ecosystemType needs to be a list with certain requirements (expected data type, minCount/maxCount, etc...). With that in mind it seems redundant to me to do this again at the top of JSON-LD files, especially when we use brackets father down to indicate that it is a list-type and the validation script works just fine without including the "@container": "@list" definitions in the @context. Do you guys think it's ok to just remove that? Is there a reason it's also included there?
  • Building on the above, I'm noticing that for pretty much every field in the JSON-LD files we explicitly define the expected datatype which we have already defined in the SHACL graph. For example to define monitoringFrequency and verificationMethod which are of schema:frequency and list(xsd:string) types respectively, we do:
  "regen:monitoringFrequency": {
    "@type": "schema:frequency",
    "@value": ""
  },  
  "regen:verificationMethod": [
    {
      "@type:": "xsd:string",
      "@value": ""
    }
  ],

rather than just doing something like this:

  "regen:monitoringFrequency": "",  
  "regen:verificationMethod": [ ],

Is there a reason for this? Is it just because we just want to know what type of element to put there?

  • In terms of overall architecture of the SHACL sub-folder, I found it a bit confusing how we were structuring and storing re-usable graph components. For instance, in the shacl/common.ttl file we store offsetGenerationMethod, ToucanURI, & ProjectActivity shape elements, but we store methodology shapes independently in the shacl/methodologies/methodology.ttl. Similarly, I know we use the NameURL shape in Credit Class, Project, and Batch level SHACL schemas and JSON-LD files, but the actual NameURL shape lives in the shacl/projects/common.ttl file and not in something more universal. With that said, I'm wondering if we want to do a bit of restructuring / rearchitecting of this repo and break things out into more composable elements. Thoughts?

cc @clevinson @ryanchristo

@blushi
Copy link
Member

blushi commented Jun 20, 2023

@wgwz @blushi this PR contains (or will contain) all the metadata for the Ruuts credit class, pilot project, and credit batch. So far I've only finished the Credit Class metadata, so a JSON-LD template, a JSON-LD ops file filled out with the appropriate info, and a new SHACL schema. If you can review what I have so far that would be great!

In creating all of this I had a few questions / comments:

  • In the JSON-LD files we use the @context section to define certain fields explicitly as lists and I'm wondering if this is actually needed or if it's redundant. As far as I understand defining expected data types/structures, like regen:ecosystemType, is something we do in the SHACL schema, meaning that in the SHACL schema we would say that regen:ecosystemType needs to be a list with certain requirements (expected data type, minCount/maxCount, etc...). With that in mind it seems redundant to me to do this again at the top of JSON-LD files, especially when we use brackets father down to indicate that it is a list-type and the validation script works just fine without including the "@container": "@list" definitions in the @context. Do you guys think it's ok to just remove that? Is there a reason it's also included there?

This is needed so we don't need to write in the actual JSON-LD data "regen:ecosystemType": { "@list": [ ... ] }. Having it just defined in the context instead makes it more readable and we decided that this should be the standard. This is compacted JSON-LD, please read more about it here: https://www.w3.org/TR/json-ld11-api/#compaction
Also since in the SHACL schema, we say it should be a list, if you don't have the field defined as a list in the actual JSON-LD data (whether it's done through the context or directly in the rest of the data), the validation would fail.
Please note that having it in both SHACL and the JSON-LD data is not redundant, they are two separate things, SHACL is meant for validating data so the data should conform to what's defined there in order to be considered valid.

  • Building on the above, I'm noticing that for pretty much every field in the JSON-LD files we explicitly define the expected datatype which we have already defined in the SHACL graph. For example to define monitoringFrequency and verificationMethod which are of schema:frequency and list(xsd:string) types respectively, we do:
  "regen:monitoringFrequency": {
    "@type": "schema:frequency",
    "@value": ""
  },  
  "regen:verificationMethod": [
    {
      "@type:": "xsd:string",
      "@value": ""
    }
  ],

rather than just doing something like this:

  "regen:monitoringFrequency": "",  
  "regen:verificationMethod": [ ],

Is there a reason for this? Is it just because we just want to know what type of element to put there?

If you put in the "@context": "regen:monitoringFrequency": { "@type": "schema:frequency" } then in the data, you can just use "regen:monitoringFrequency": ""
This is the whole point of defining that in the context instead of directly in the data. But it has to stay in the JSON-LD in some way (ie here in the context) because otherwise how would you know it's a schema:frequency? Having the data doesn't mean you have access to the SHACL schema, and again SHACL is meant for validating data, not for defining the data itself.

  • In terms of overall architecture of the SHACL sub-folder, I found it a bit confusing how we were structuring and storing re-usable graph components. For instance, in the shacl/common.ttl file we store offsetGenerationMethod, ToucanURI, & ProjectActivity shape elements, but we store methodology shapes independently in the shacl/methodologies/methodology.ttl. Similarly, I know we use the NameURL shape in Credit Class, Project, and Batch level SHACL schemas and JSON-LD files, but the actual NameURL shape lives in the shacl/projects/common.ttl file and not in something more universal. With that said, I'm wondering if we want to do a bit of restructuring / rearchitecting of this repo and break things out into more composable elements. Thoughts?

cc @clevinson @ryanchristo

Yeah I guess we could make sure that the shapes that are being used in different classes are being put in shacl/common.ttl instead

jsonld/credit-classes/C04-ruuts-credit-class.jsonld Outdated Show resolved Hide resolved
jsonld/credit-classes/C04-ruuts-credit-class.jsonld Outdated Show resolved Hide resolved
jsonld/credit-classes/C04-ruuts-credit-class.jsonld Outdated Show resolved Hide resolved
jsonld/credit-classes/C04-ruuts-credit-class.jsonld Outdated Show resolved Hide resolved
jsonld/credit-classes/C04-ruuts-credit-class.jsonld Outdated Show resolved Hide resolved
shacl/common.ttl Outdated Show resolved Hide resolved
@S4mmyb
Copy link
Member Author

S4mmyb commented Jun 20, 2023

Awesome, thanks so much @blushi ! That's super helpful context on why we are including things and makes sense to me now. I was just a bit confused earlier and wanted to understand why we used the @context to pre-define datatypes. I just updated with your comments and it looks a lot cleaner already! Hopefully will be moving onto the project and batch metadata soon

@wgwz
Copy link
Contributor

wgwz commented Jun 21, 2023

  • In terms of overall architecture of the SHACL sub-folder, I found it a bit confusing how we were structuring and storing re-usable graph components. For instance, in the shacl/common.ttl file we store offsetGenerationMethod, ToucanURI, & ProjectActivity shape elements, but we store methodology shapes independently in the shacl/methodologies/methodology.ttl. Similarly, I know we use the NameURL shape in Credit Class, Project, and Batch level SHACL schemas and JSON-LD files, but the actual NameURL shape lives in the shacl/projects/common.ttl file and not in something more universal. With that said, I'm wondering if we want to do a bit of restructuring / rearchitecting of this repo and break things out into more composable elements. Thoughts?

Yeah I guess we could make sure that the shapes that are being used in different classes are being put in shacl/common.ttl instead

I'm open to ideas about how to restructure things because it is easy to do but I wouldn't want to do too much of this. I think the suggestion of moving NameURL out of shacl/projects/common.ttl into shacl/common.ttl makes good sense though.

@clevinson
Copy link
Member

clevinson commented Jun 22, 2023

Is it worth reconsidering our use of SHACL schemas for credit class metadata?

As @blushi indicated in her earlier comment, SHACL schemas are primarily about dataset validation. I don't think we ever actually will need to validate that a given dataset is a valid "C04-CreditClass" shape, as there will only ever be one instance of such a dataset (the one dictated here in this PR). And if we ever are making upgrades to the C04-CreditClass JSON-LD, then it will likely be in a way that our previous SHACL schema for C04-CreditClass isn't accounting for (e.g. adding a new field to the JSON-LD).

I think a more appropriate use of SHACL in the context of registry standards would be:

  • we have a singular SHACL schema for credit class metadata. All credit class metadata JSON-LD's must be valid with respect to this SHACL schema
  • we have generic SHACL schemas defined for the the following:
    • project (regen:ProjectShape)
    • credit batch (regen:CreditBatchShape)
  • Additionally, we would have credit-class specific SHACL schemas ccdefined for each of:
    • project (inheriting from the generic regen:ProjectShape)
    • credit batch (inheriting from the generic regen:CreditBatchShape)

Some docs / references for what I'm talking about w/ inheritance:

If we do want to make this change across all credit classes, then that probably warrants a separate PR. But if we are aligned with this approach, then I think we can probably strip the SHACL schema from this Ruuts specific PR (unless @S4mmyb you want to draft a proposed set of updates to the regen:CreditClassVersionShape SHACL schema (as a sidenote- I'd be curious to understand why we call this regen:CreditClassVersion instead of regen:CreditClass)

@blushi
Copy link
Member

blushi commented Jun 22, 2023

Is it worth reconsidering our use of SHACL schemas for credit class metadata?

As @blushi indicated in her earlier comment, SHACL schemas are primarily about dataset validation. I don't think we ever actually will need to validate that a given dataset is a valid "C04-CreditClass" shape, as there will only ever be one instance of such a dataset (the one dictated here in this PR). And if we ever are making upgrades to the C04-CreditClass JSON-LD, then it will likely be in a way that our previous SHACL schema for C04-CreditClass isn't accounting for (e.g. adding a new field to the JSON-LD).

I think a more appropriate use of SHACL in the context of registry standards would be:

  • we have a singular SHACL schema for credit class metadata. All credit class metadata JSON-LD's must be valid with respect to this SHACL schema

Indeed this is one thing that I was proposing in my PR: #56 (comment) because right now, we are just repeating more or less the same constraints in all credit classes SHACL schemas.
But in addition to a generic credit class SHACL schema, we might still need credit class specific schemas to document any additional fields that might only appear in certain credit class metadata (eg regen:carbonOffsetStandard in C02 or regen:tokenizationSource in C03), although agreed this might not be really used, only for documentation purposes...

  • we have generic SHACL schemas defined for the the following:

    • project (regen:ProjectShape)
    • credit batch (regen:CreditBatchShape)
  • Additionally, we would have credit-class specific SHACL schemas ccdefined for each of:

    • project (inheriting from the generic regen:ProjectShape)
    • credit batch (inheriting from the generic regen:CreditBatchShape)

👍

Some docs / references for what I'm talking about w/ inheritance:

If we do want to make this change across all credit classes, then that probably warrants a separate PR. But if we are aligned with this approach, then I think we can probably strip the SHACL schema from this Ruuts specific PR (unless @S4mmyb you want to draft a proposed set of updates to the regen:CreditClassVersionShape SHACL schema (as a sidenote- I'd be curious to understand why we call this regen:CreditClassVersion instead of regen:CreditClass)

The regen:CreditClassVersion naming is referencing our registry-server postgres table credit_class_version that was meant to support multiple versions of a credit class initially but this should be considered legacy, this is basically what I did in my currently open PR: https://github.com/regen-network/regen-registry-standards/pull/56/files#diff-aee3c11f2f6cabf346b956fc928f729c4c984b5f18832b583f1af9fee5c825ed

"Biodiversity"
],
"regen:geographicApplicability": "Global",
"regen:eligibleActivities": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@S4mmyb why not using regen:projectActivities as in existing credit classes metadata?
or should we use regen:eligibleActivities everywhere instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to separate out from the set of activities a credit class allows for and the specific activities a project actually engages in (which could be a subset). Right now we use regen:projectActivity in the project metadata to indicate the list of activities a project is implementing. I was thinking we could use regen:eligibleActivities to indicate the list of activities approved by that credit class a project could choose from. Open to other terms, but I think distinguishing the could be nice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do already use regen:projectActivities too in existing credit class metadata: C02 and C03
so I believe we should use the same naming in all credit classes

jsonld/credit-classes/C04-ruuts-credit-class.jsonld Outdated Show resolved Hide resolved
Comment on lines +42 to +50
"regen:creditingTerm": {
"@type": "schema:activityDuration"
},
"regen:lookbackPeriod": {
"@type": "schema:activityDuration"
},
"regen:permanencePeriod": {
"@type": "schema:activityDuration"
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those should be of type schema:Duration https://schema.org/Duration (and not schema:activityDuration which is not a type but a property that such a schema:Duration can take: https://schema.org/activityDuration)

Suggested change
"regen:creditingTerm": {
"@type": "schema:activityDuration"
},
"regen:lookbackPeriod": {
"@type": "schema:activityDuration"
},
"regen:permanencePeriod": {
"@type": "schema:activityDuration"
},
"regen:creditingTerm": {
"@type": "schema:Duration"
},
"regen:lookbackPeriod": {
"@type": "schema:Duration"
},
"regen:permanencePeriod": {
"@type": "schema:Duration"
},

Comment on lines +51 to +56
"regen:monitoringFrequency": {
"@type": "schema:frequency"
},
"regen:verificationFrequency": {
"@type": "schema:frequency"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema:frequency is not a type but a property: https://schema.org/frequency
there's no such thing as a frequency type in schema.org it seems...

Suggested change
"regen:monitoringFrequency": {
"@type": "schema:frequency"
},
"regen:verificationFrequency": {
"@type": "schema:frequency"
}

@blushi
Copy link
Member

blushi commented Jul 25, 2023

While working on regen-network/rnd-dev-team#1756, I've realized there are a few issues in the C04 metadata, see my comments above

Comment on lines +117 to +126
"regen:bufferPoolAccounts": {
"@type": "schema:ItemList",
"schema:itemListElement": [
{
"schema:name": "Credit Class (Pooled) Buffer Pool",
"regen:walletAddress": "regen17pmq7hp4upvmmveqexzuhzu64v36re3w3447n7dt46uwp594wtpsuuh7f6",
"regen:poolAllocation": "5%"
}
]
},
Copy link
Member

@blushi blushi Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just using a @list here? are we expecting regen:bufferPoolAccounts to have other properties than schema:itemListElement like schema:url or schema:name for instance?

"schema:itemListElement": [
{
"schema:name": "Credit Class (Pooled) Buffer Pool",
"regen:walletAddress": "regen17pmq7hp4upvmmveqexzuhzu64v36re3w3447n7dt46uwp594wtpsuuh7f6",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what we've been using so far

Suggested change
"regen:walletAddress": "regen17pmq7hp4upvmmveqexzuhzu64v36re3w3447n7dt46uwp594wtpsuuh7f6",
"regen:address": "regen17pmq7hp4upvmmveqexzuhzu64v36re3w3447n7dt46uwp594wtpsuuh7f6",

@blushi
Copy link
Member

blushi commented Aug 9, 2023

Making this PR ready for review.
@clevinson and I agreed to just refactor this for now to fit the latest repo structure (no jsonld example for classes and only class specific constraint in the related class SHACL schema), since the current metadata has already been submitted on-chain as is.
I've also verified that the C04 on-chain metadata IRI regen:13toVgBisQqEmauHntQsW6mwpz71RSTwsjzhn9gxCQ16tdRjHPHTRoK.rdf matches the C04 jsonld in this PR.
The other suggestions/fixes will be submitted as part of another PR (new issue: #65)

@blushi blushi marked this pull request as ready for review August 9, 2023 11:39
@blushi blushi self-requested a review August 9, 2023 11:39
@clevinson clevinson merged commit 43d2b8a into main Aug 14, 2023
1 check passed
@clevinson clevinson deleted the sam/C04-metadata branch August 14, 2023 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants