-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrroc schema #31
base: main
Are you sure you want to change the base?
Wrroc schema #31
Conversation
Reviewer's Guide by SourceryThis pull request introduces a new schema for converting between the GA4GH WES (Workflow Execution Service) profile and the Workflow Run Crate (WRC) extension of RO-Crates. The schema is defined in a new file, schema.md, which provides a detailed mapping of fields and structures between the two formats. No diagrams generated as the changes look simple and do not need a visual representation. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @SalihuDickson - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟡 Documentation: 6 issues found
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
||
## WORKFLOW RUN CRATE RO-CRATE FIELDS | ||
|
||
For a Json object to conform with the Ro-Crate specification there are certain minimal requirements. Further still the WRC estension has it's own requirements. Considering that, the following definitions aim to differentiate the diffrent types of fields and how they should be considered in each implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (documentation): Fix typos: 'estension' and 'diffrent'
Please correct 'estension' to 'extension' and 'diffrent' to 'different'.
schema.md
Outdated
1. constant: These fields are required and for our purposes the value of this field should always be the default value. In other words their values are constant. | ||
2. required: These fields are required but their values are not constant. | ||
3. recommended: These fields are not required but they are recommended to give a comprehensive definition for your RO-crate object. | ||
4. optional: These fields are optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Inconsistent capitalization in Terminology section
Consider making the capitalization consistent across all items in the Terminology section.
1. constant: These fields are required and for our purposes the value of this field should always be the default value. In other words their values are constant. | |
2. required: These fields are required but their values are not constant. | |
3. recommended: These fields are not required but they are recommended to give a comprehensive definition for your RO-crate object. | |
4. optional: These fields are optional. | |
1. Constant: These fields are required and for our purposes the value of this field should always be the default value. In other words their values are constant. | |
2. Required: These fields are required but their values are not constant. | |
3. Recommended: These fields are not required but they are recommended to give a comprehensive definition for your RO-crate object. | |
4. Optional: These fields are optional. |
- **@context**(constant): | ||
- **WES field**: N/A | ||
- **type**: `string` | `[string]` | ||
- **description**: URI pointing to the official context page of the version of Ro-Crate and any extensions being implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (documentation): Clarify @context description
The description mentions 'any extensions', but the default value only includes one extension. Consider clarifying if multiple extensions are expected or adjusting the description.
|
||
- **WES field**: N/A | ||
- **type**: `string` | ||
- **description**: identifier for the dataset object. Must match the metadata.@id feild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (documentation): Fix typo: 'feild'
Please correct 'feild' to 'field' in the description.
reference: | ||
|
||
1. [https://www.researchobject.org/ro-crate/specification/1.1/metadata.html](https://www.researchobject.org/ro-crate/specification/1.1/root-data-entity.html) | ||
2. https://www.researchobject.org/workflow-run-crate/profiles/workflow_run_crate/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Inconsistent formatting of reference links
Consider making the formatting of reference links consistent. The first link is formatted as a proper link, while the second is plain text.
reference: | |
1. [https://www.researchobject.org/ro-crate/specification/1.1/metadata.html](https://www.researchobject.org/ro-crate/specification/1.1/root-data-entity.html) | |
2. https://www.researchobject.org/workflow-run-crate/profiles/workflow_run_crate/ | |
reference: | |
1. [RO-Crate Specification 1.1: Root Data Entity](https://www.researchobject.org/ro-crate/specification/1.1/root-data-entity.html) | |
2. [Workflow Run Crate Profile](https://www.researchobject.org/workflow-run-crate/profiles/workflow_run_crate/) |
- **identifier**(recommended): | ||
- **WES field**: N/A | ||
- **type**: `string` | ||
- **description**: A reference to a relevant and comprehensive description of the license. May be a URI to the official webpage describing the license. Should match the `license.@id` field. Should be added as some algorithms may look for this instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Clarify usage of 'identifier' field
The description mentions that 'some algorithms may look for this instead.' Consider providing more specific information about when and why this field might be used.
- **description**: A reference to a relevant and comprehensive description of the license. May be a URI to the official webpage describing the license. Should match the `license.@id` field. Should be added as some algorithms may look for this instead. | |
- **description**: A reference to a relevant and comprehensive description of the license. Typically a URI to the official webpage describing the license. Should match the `license.@id` field. This field is important for interoperability, as some algorithms and tools specifically search for an 'identifier' field when processing license information, rather than using `license.@id`. |
@uniqueg, this PR contains my proposed schema to convert GA4GH WES profiles to Ro-Crates. It has not been completed, but all required fields have been added, can you please take a look. The document is pretty long though 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good work, thanks @SalihuDickson - I think it would be good to have a concluding overview/summary table that links the different WES properties to the different WRROC properties as a quick reference.
And also one that goes in the other direction and lists all required and recommended WRROC properties and checks if we cover them somehow just from info from a WES run.
Also, maybe for the WES properties that are not yet accounted for, could you indicate for each what problems you are facing?
schema.md
Outdated
- **@type**(required): | ||
|
||
- **WES field**: N/A | ||
- **type**: `string` | ||
- **description**: The type of the content stored in the Ro-Crate. | ||
- **default**: | ||
|
||
```json | ||
"CreativeWork" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I read the docs, @type
MUST be CreativeWork
- so I guess this is also "constant" (as per your definition), not "required".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yh, I'll fix that.
schema.md
Outdated
"CreativeWork" | ||
``` | ||
|
||
- **conformsTo**(constant): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unindent
schema.md
Outdated
"CreativeWork" | ||
``` | ||
|
||
- **conformsTo**(constant): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's only partly "constant": the URL to the RO-Crate specification is always there, but possibly there may be a different number of extensions.
Also, it's of note that the URLs are versioned so may change over time. I suppose for a given version of our library, they will still be constant - just wanted to point out that it may need updates over time (so perhaps worth noting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can add this to the description. I just thought adding caveats like this would make the document much longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing is that the versions stated here are only meant to conform to the profile detailed in this documentation. So if the documentation is updated to reflect a new version then the default value for this should be updated as well.
This is the base Ro-Crate object. | ||
|
||
- **@context**(constant): | ||
- **WES field**: N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could distinguish between "N/A" meaning "not applicable" (i.e., we don't need WES here, it doesn't apply - so no need to think about the mapping; which is the case here) and "not available" (i.e., we would need this value from somewhere, according to whether it's required, recommended, or optional - but WES doesn't supply it).
These "WES field: not available" properties would be the most tricky ones!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a clarification could be beneficial, but what N/A currently means throughout this document is "not applicable". Because all the required fields that WES does not provide usually fall into 1 of 3 categories; either they're constant, their values are descriptive, i.e a name or description, or the values can easily be generated like an ID. So I didn't really see a need to distinguish or which of these 3 scenarios would fall into the "not available" category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps after we better distinguish which of these scenarios best fits the "not applicable" description then I can adjust the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for every value, we should try to come up with a possible source (WES, TES or otherwise), or multiple (see below) - and that should be our focus. Of course, within those, required properties MUST, recommended SHOULD and optional MAY have a source (or more).
So I guess what I'm trying to say is that maybe we don't need to think too much about what it means with respect to WES, but rather what the best source will be. Sometimes it's obvious (that could be the constants you are talking about, or a really good field from WES, or something that can be generated on the fly), sometimes it's not so obvious. In some cases, it might be a compound, or something that may or may not be present, depending on the context (e.g., what exactly the workflow run was like). For these, we should list all the possible sources that could help, and possibly define a priority of sources (e.g., if we have this, then that'd be the best, but if we don't, maybe this other thing is the next best option).
In the end we really need this summary table that outlines exactly that - an actionable mapping of sources for each WRROC property, i.e., precise enough that it can be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically what we already have, but with more details (we did a much better job for TES).
Also: Not sure if you are on the official RO-Crate Slack. There were some issues discussed that other people had with mapping WES to WRROC a few months ago - and I don't think we ever picked them up or solved them. I'll share a link when I find the time next week.
schema.md
Outdated
1. constant: These fields are required and for our purposes the value of this field should always be the default value. In other words their values are constant. | ||
2. required: These fields are required but their values are not constant. | ||
3. recommended: These fields are not required but they are recommended to give a comprehensive definition for your RO-crate object. | ||
4. optional: These fields are optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these are technically two different classifications that are, at least in principle, independent of one another:
- constant or not constant (bool)
- required, recommended or optional
Perhaps in reality, all "constants" are required, but perhaps it need not be the case. So for clarity, perhaps it's best to explicitly state both classifications for each property, e.g., @context(required, constant)
.
schema.md
Outdated
|
||
- **WES field**: N/A | ||
- **type**: `{"@id": string}` | ||
- **description**: contains an `@id` field pointing to the dataset entity of the RO-Crate. Must match the dataset.@id field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification: Must match the @id
field of the root dataset.
|
||
## WORKFLOW RUN CRATE RO-CRATE FIELDS | ||
|
||
For a Json object to conform with the Ro-Crate specification there are certain minimal requirements. Further still the WRC estension has it's own requirements. Considering that, the following definitions aim to differentiate the diffrent types of fields and how they should be considered in each implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only focus on what's required for WRROC, not for RO-Crates in general. In other words, the library would take an existing valid RO-Crate and add the entitites to it that result from the workflow (or tool) run - but NOT generate an RO-Crate from scratch. Because if we support the latter, it's easy to lose focus and we will end up implementing an entire RO-Crate generating library. We'd need to allow users to specify that, and then that requires some frontend etc. That will be done elsewhere. Rather, we want to be able to build the WRROC-specific parts just from the WES or TES logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand completely, does this mean the CrateGen tool won't be stand alone? And it'll need to be one part of a library that generates the RO-Crate fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say I would like to avoid for this tool to do things that require user intervention. But anything that isn't either, as you say, a constant, or can be fetched from TES or some other API call, likely needs to be provided by the user. And some of the stuff that you described that relates to RO-Crates in general (i.e., not the WRROC-specific stuff) may fall within that category.
Sure, we could avoid a CLI with options for the user to fill in RO-Crate descriptions and the like by instead defining some dummy defaults so that we can create self-contained RO-Crates - with the expectation that users would change these later on. But that's a bit risky - it might lead to bad quality RO-Crates! Also, that's not the core business of this tool, so shouldn't be high priority.
The more likely (and safer) use case in the real world is that there's already an RO-Crate in the system, maybe containing project, author, and funding information. So for now I think we should just focus on the the use case where the client passes an existing RO-Crate, and we put the WRROC-related entities inside and fill up from the generic stuff what absolutely needs filling up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, that makes sense, i guess I can isolate those fields so we can fill only those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can add both options, allow the user to insert the descriptive fields (or whatever other fields they would like to fill in as well), then provide the partially created RO-Crate to the CLI tool to fill in the remaining fields. Like this they can also fill in fields that can be generated then we can ask which fields they would like to overwrite or keep as is.
We can also add the option to generate the RO-Crate and then fill in the descriptive fields later. Personally, I feel this would provide a better user experience as the outstanding fields will pretty much already be underlined for them. Also I imagine this tool will not likely be used among people with no technical knowledge, as people without some technical knowledge they are more likely to stick to web based tools with interactive UIs. So we can probably expect the core user base to understand that they need to fill in the required fields after the CLI tool generates the RO-Crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the table, sure I'lll get on that. And I'll detail the reason for each outstanding field.
Great! I guess it's best to just extend the existing one. |
I've written the table in the schema document. Due to the differences btw both documents, I can't extend the existing one with this, as they don't match. |
This PR aims to detail a conversion between the GA4GH WES profile to the Workflow Run Crate (WRC) extension of Ro-Crates.
This is a rewrite of the schema that is currently being used, to ensure conformity with the official RO-Crate specification.
Summary by Sourcery
Add a new schema document to facilitate the conversion of GA4GH WES profiles to Workflow Run Crate extensions, ensuring conformity with the RO-Crate specification.
New Features:
Documentation: