Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Create NWBDatasetIncSpec and NWBGroupIncSpec #884

Open
3 tasks done
rly opened this issue Jun 28, 2023 · 4 comments
Open
3 tasks done

[Feature]: Create NWBDatasetIncSpec and NWBGroupIncSpec #884

rly opened this issue Jun 28, 2023 · 4 comments
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of users topic: docs Issues related to documentation topic: extension issues related to extensions or dynamic class generation

Comments

@rly
Copy link
Contributor

rly commented Jun 28, 2023

What would you like to see added to HDMF?

When building extensions, users are often confused over how to define groups and datasets within a parent group. For example, one could define:

Parcellations = NWBGroupSpec(
        default_name='parcellations',
        neurodata_type_def='Parcellations',
        neurodata_type_inc='NWBDataInterface',
        doc='parcellations of this surface',
        datasets=[
            NWBDatasetSpec(
                neurodata_type_def='Parcellation',
                doc='a parcellation of the surface',
                quantity='+'
            )
        ]
    )

While technically legal, this nested type definition is difficult to parse. This code also hides the fact that the Parcellation is now defined and can be reused in other types. As a result, this goes against our schema best practices hdmf-dev/hdmf-schema-language#14

This is the preferred code:

Parcellation = NWBDatasetSpec(
        neurodata_type_def='Parcellation',
        doc='a parcellation of the surface',
    )

Parcellations = NWBGroupSpec(
        default_name='parcellations',
        neurodata_type_def='Parcellations',
        neurodata_type_inc='NWBDataInterface',
        doc='parcellations of this surface',
        datasets=[
            NWBDatasetSpec(
                neurodata_type_inc='Parcellation',
                doc='a parcellation of this surface',
                quantity='+'
            )
        ]
    )

However, NWBDatasetSpec serves two different functions here. The first defines the new type. The second says only instances of this type can be used and at least one must be provided. In the first usage where the type is defined, quantity makes no sense, and name really shouldn't be set. In the second usage, neurodata_type_def should not be used, and default_name doesn't really make sense. Same thing for groups.

This is explained in the language, but honestly, I think this is confusing and the third row of the table is distinctly different from the other three and should be separated out as I describe below.
https://hdmf-schema-language.readthedocs.io/en/latest/description.html#data-type-def-and-data-type-inc

Is your feature request related to a problem?

No response

What solution would you like?

To make this distinction easier for users, I suggest adding new classes NWBDatasetIncSpec and NWBGroupIncSpec that are the only types allowed in a NWBGroupSpec definition under groups and datasets. neurodata_type_def is not allowed here. I would also change the keyword to just neurodata_type. We would also need to update the tutorials.

We can also warn when quantity and name are used in a NWBDatasetSpec that is not a NWBDatasetIncSpec (we cannot restrict it because that would break backwards compatibility).

I suggest we also add these new spec types to the hdmf-schema-language.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

@oruebel
Copy link
Contributor

oruebel commented Jun 28, 2023

I suggest adding new classes NWBDatasetIncSpec and NWBGroupIncSpec

That's an interesting proposal. I think from an interface perspective this could work. I think the name IncSpec is not very intuitive, maybe IncludeDatasetSpec? How does this affect definition of datasets/groups that do not have a neurodata_type_def but only have a fixed name? Would those also be defined outsite and then included?

@oruebel oruebel added category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of users topic: extension issues related to extensions or dynamic class generation labels Jun 28, 2023
@rly
Copy link
Contributor Author

rly commented Jun 28, 2023

I like IncludeDatasetSpec or maybe InnerDatasetSpec. Good point. I forgot that use case. No, those should be allowed in this new spec type.

@oruebel
Copy link
Contributor

oruebel commented Jun 28, 2023

I forgot that use case. No, those should be allowed in this new spec type.

Just to clarify. The primary difference then is that neurodata_type_def is not allowed in IncludeDatasetSpec. In some sense, this is the difference then between defining a class and instantiating an instance of it (only the term instantiating a type would be confusing here).

For name, I think describing the "operation" (e.g., include) rather than location (e.g., inner) seems more intuitive to me. The name IncludeDatasetSpec is not bad, but might be confused with the include term in a program language sense of an include in C/C++ (which may be confusing if the class allows creating datasets specs with fixed name and no type). How about maybe InsertDatasetSpec?

@rly
Copy link
Contributor Author

rly commented Nov 28, 2023

Just noting another instance where this IncludeDatasetSpec (or whatever we want to call it) class, automatic checks for nested type definitions, and appropriate documentation around these would have prevented copied and nested spec definitions: nwb-extensions/staged-extensions#44

I will bump this issue up in priority because we will only see more of these instances occur.

@rly rly added topic: docs Issues related to documentation priority: high impacts proper operation or use of feature important to most users and removed priority: medium non-critical problem and/or affecting only a small set of users labels Nov 28, 2023
@rly rly added priority: medium non-critical problem and/or affecting only a small set of users and removed priority: high impacts proper operation or use of feature important to most users labels Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of users topic: docs Issues related to documentation topic: extension issues related to extensions or dynamic class generation
Projects
None yet
Development

No branches or pull requests

2 participants