Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flattend graph and added other changes of consortium meeting nov. 2024 #95

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

SteffenBrinckmann
Copy link
Collaborator

No description provided.

@SteffenBrinckmann SteffenBrinckmann self-assigned this Nov 18, 2024
@FlorianRhiem
Copy link
Contributor

Hey @SteffenBrinckmann,
I can import the .eln file with SampleDB, and get one object:
image

Looking into the data, there are Dataset entries in the graph that include variableMeasured entries, such as ./PastasExampleProject/001_ThisIsAnotherExampleTask and its subtask, which are not listed as parts of the root data entity. As a result, my importing code does not consider them to be importable objects, but unsupported supplementary information for ./PastasExampleProject/. I strongly suspect this is not the intention behind these objects, as they seem structured as directories and have the folder custom genre.

How do others interpret/parse/handle Dataset entries that are not direct part of ./?

@SteffenBrinckmann
Copy link
Collaborator Author

As far as I understand, all entries only have the direct children as hasParts. As such, the root data entry only has its direct children as hasParts.

@FlorianRhiem
Copy link
Contributor

Currently our spec states:

Subsequently, all the remaining nodes are assigned a @type of either Dataset for directories or File for individual files. And the @id corresponds to something in the hasPart of ./.

If a Dataset node has additional files, they should be listed in its hasPart property and can be referenced through their @id.

This is also how I've handled it so far, with Dataset nodes that are not part of ./, but of another Dataset node, providing supplementary information (e.g. version info in case of SampleDB) for that Dataset node.

@SteffenBrinckmann
Copy link
Collaborator Author

That spec. statement results necessarily in a flat graph of the top layer "./" and all other nodes being siblings on the second layer.

I was not aware of this limitation and would vote to allow deep graphs. The RO-crate spec has the example of 3 layers: (https://www.researchobject.org/ro-crate/specification/1.1/data-entities.html#referencing-files-and-folders-from-the-root-data-entity but does not go into details of supplemental information)

Alternative path: if we decide on keeping the current spec., then I can flatten the graph that Pasta produces but add some additional key:value-pair that contains the full hierarchy for those ELNs that handle the full graph.

@nicobrandt
Copy link
Contributor

I think we never fully discussed whether we should restrict the potential directory structure in our spec. The RO-Crate spec itself seems to be pretty lax (see the URL @SteffenBrinckmann posted), but most ELNs probably won't be able to import arbitrarily nested structures, or are able to handle the possibility of having either directories or files on the "top-level", etc. Probably worth moving this into a separate issue for further discussion?

@SteffenBrinckmann
Copy link
Collaborator Author

Pause this merge until #98 is settled.
(Didn't we a few days ago agree that we are almost aligned ;-) )

@SteffenBrinckmann
Copy link
Collaborator Author

Since everybody was going in the direction of a more flat structure: I adopted to that.
Please review and give me feedback.

Also debugged one test as it gave a false-positive.

@FlorianRhiem
Copy link
Contributor

  • Is https:/upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Misc_pollen.jpg/315px-Misc_pollen.jpg a typo, or is it intentionally both an invalid URL and an invalid path?
  • There are objects with variableMeasured which have the File type rather than Dataset

@SteffenBrinckmann
Copy link
Collaborator Author

SteffenBrinckmann commented Dec 4, 2024

@FlorianRhiem

  1. The https:// is a bug, which I have to hunt-down and fix. Funny: when I copy-paste the url from the comment into my browser, I get a nice image.
  2. is that forbidden: can a file not have variable measured? Or should that be a dataset and the file is only the plain file?
  3. is additionalType in
    {
      "@id": "ro-crate-metadata.json",
      "@type": "CreativeWork",
      "about": {
        "@id": "./"
      },
      "conformsTo": {
        "@id": "https://w3id.org/ro/crate/1.1"
      },
      "version": "1.0",
      "additionalType": "https://purl.archive.org/purl/elnconsortium/eln-spec/1.1"
    },

to everybodies liking?

@FlorianRhiem
Copy link
Contributor

  • Okay, if it's a bug, then the fixed version should be good. I need to extend the file handling logic in SampleDB for missing files, valid and invalid URLs, but that should just be issues on the SampleDB side. I would suggest, though, that you include the url property for files like that, even if that's identical to the @id, just to make things a bit more clear.
  • In my understanding, a dataset can have variableMeasured and a file is just a file, but I don't know for certain, as File is not a type with information listed on schema.org.

@SteffenBrinckmann
Copy link
Collaborator Author

  • variableMeasured of File, would be something to be discussed at the next meeting
  • any takes on 'additionalType'

@FlorianRhiem
Copy link
Contributor

I just checked the RO Crate spec for something else and noticed that this section explains that File is an alias for https://schema.org/MediaObject, which does not have variableMeasured among its properties.

@SteffenBrinckmann
Copy link
Collaborator Author

@FlorianRhiem I would argue that if it has a description and comment, .... then it should also have a variable measured. In our context the variable measured is more loosely defined.
Perhaps we should put it on the agenda for the next meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants