-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Dataverse for mandatory metadata fields via API #6978
Comments
Just to add, the customer has said that once they have defined their mandatory metadata for a dataverse or subdataverse, it seldom or never changes. So we (RSpace) could just use some static-list-lookup mechanism to handle this particular use case . But being able to retrieve the properties from an API call would be superior as it would always be up-to-date. |
Thanks @richarda23, this is a good idea and makes sense. |
Hmm, from pull request #7942 I believe Something else to consider is that templates can require additional fields but I don't know if the API respects this or not. It should if it doesn't. From the original report above it sounds like templates may be in use and enforced via API because Producer Name is not one of the five fields that is usually required. This issue is related in the sense that it would be nice if the API could return more information about what it needs or allows:
There is an "admin" API that can give some detail on metadata fields but as noted at https://guides.dataverse.org/en/5.8/admin/metadatacustomization.html#exploring-metadata-blocks the output is ugly and it could stand to be cleaned up before it's ready for public consumption: Here you can see that "title" is required:
|
Got it - thanks @pdurbin. I was hopeful :) |
This issue is also relevant to @hermes-hmc, as we might want to validate metadata before depositing instead of try-and-error. I'm going to add the Hermes label for easier tracking what might be in scope for our project. |
At KU Leuven we are interested in this as well for future integrations with our other systems. One additional piece of information that would be required to generate valid submissions, is the allowed values for fields with controlled vocabulary. The external vocabularies may make that way more complex, but I hope that would be supported too. |
In a future version of Dataverse where issue 6885 hopefully has been solved, also recommended metadata fields should pop up in the integrated system. I guess in some/many cases, the list of metadata fields can become quite long, as we want our depositors to provide as much metadata as possible. I'm wondering whether users in most (all?) cases anyway would have to navigate to the actual dataset draft in Dataverse and add additional metadata. So, the question is how integration involving metadata registration can be designed in a way that makes the researcher's work as easy as possible. |
I have just had a discussion with @shlake about Dataverse integrations with tools like OSF and RSpace. |
The university and their researchers have made it very clear that they expect to be able to create the datasets from the institutional repository (Symplectic Elements). We hope to go live in January next year and that will be without that feature. But we will have to implement that somehow in the next year or so. Then there is also a request to migrate datasets from iRODS to Dataverse. Agreed, we may be able to work around it to some extend, but solving this GitHub issue would surely make it easier to get our integration scenarios working. |
From RSpace perspective, the idea of researchers being able to make deposits from an ELN is solely to lower the barrier to entry to getting data and files and associated metadata (like Orcid IDs, author information) into a repository, and to be able to do so from a familiar software.
I don't think the counter-proposal of pulling from an ELN into Dataverse is an either / or scenario; both would work depending on what user prefers. But the 'pull from ELN' requires Dataverse to develop UI to browse and configure exports for each and every for ELN or datasource it wants to support. A 2-step procedure would make it easy for researchers to get started making a deposit in draft form, yet still require full verification in order to publish. |
@richarda23 interesting thought. Please see also this issue:
We already have a concept of a N/A value that has to be replace with a real value before the dataset is published via the GUI. To see this in action:
|
This is a great conversation, and provides specific examples that relate to several issues. Discovery work on #7376 points to possible benefits of a two-step process like what @richarda23 outlines. #7376 originated from a few questions: "how might we help users add metadata without making it too laborious to publish a dataset?," "how might we make adding and editing metadata more clear?" and "how might we more clearly define what is considered metadata?" We reviewed features and considered deposit/edit workflows. The next step is to mock up UI changes that present metadata required to create a draft more clearly as step 1, and additional, configurable metadata (could be required, recommended, optional as suggested in #6885) as step 2, prior to publishing. While the need to query for mandatory fields may still be necessary, I wonder if it is possible to instead agree on metadata for creating a draft, and build out/improve how datasets are "enriched". |
I'm in favour of the 2-step approach and the idea of being able to save a draft dataset with incomplete metadata. Like @richarda23, our aim is to be able to create a dataset from another application and to transfer as much as possible the data and metadata that is already known in the external application. If that metadata does not have to be complete, that would make the integration process much easier. We agree that finalizing the dataset and publishing it should be done within Dataverse. Still, being able to query Dataverse for the details of the metadata is a plus. It would be helpful in mapping metadata between applications. I assume that the API call should operate on a given Dataverse collection. |
Yes, agree totally. Knowing what is required metadata would also help external app know what metadata to send; there might be some metadata fields that require some computation or straightforward input from user that it would be good to know about at the time of deposition. It would give the external app the best chance of doing a valid submission, which would be good for user. After submission, Dataverse could respond with a boolean indicator of success or a list of required fields that are missing; this could be indicated to the user. It would be nice for user to have immediate feedback that their submission is accepted, valid and ready to be published. |
I guess for "standard" dataverses/collections, the current approach may work fine. For more customized dataverses where e.g. metadata templates with pre-filled fields are used to make deposit as easy as possible, an external tool >> Dataverse integration might be more cumbersome for the depositor. In these cases, I'd prefer to create the dataset within Dataverse, and then - if there is no Dataverse >> external tool integration that allows you to upload the data from the external tool - I'd go back to the external tool and push the data into the created dataset. If I remember correctly, this is how the OSF >> Dataverse integration works, thus you have to choose a specific dataset when you want to push your files into Dataverse. |
Right, integrations like OSF, RSpace, OJS, and Renku all assume "standard" collections and only send the five required fields (title, author, subject, description, contact). They don't have any way to query the Dataverse installation to ask if any other fields are required for this or that collection. Subject is a fixed controlled vocabulary and this old issue is about how you can't query that either. Even though the list is fixed, it should also be queryable so apps like RSpace, etc. don't have to hard code the list: Finally, this older is very similar: 2023-01-20 update... related: |
From @Kris-LIBIS: "we have a dependency on issue #6978 to know which metadata fields are available in dataverse, which are mandatory and what controlled vocabulary valid field values are. in absence of a solution for the issue above, we submitted PR #8940 and that is merged now and ready for 5.14. The PR will allow the RDM integration tool to create datasets with no metadata at all." -- https://groups.google.com/g/dataverse-community/c/aGt1ILi1Hf4/m/fnGO-Io_AQAJ |
Hello, all! @richarda23 and everyone, does the following PR resolve this issue? Should mark it as closing it (on merge)? Update: I went ahead and marked the PR to close this issue on merge. |
RSpace ELN uses the Dataverse API to submit research data to Dataverse. It has a minimal UI for metadata fields such as title, subject, description, authors, contacts and this works on various Dataverses till now.
One of our RSpace customers has their own Dataverse as well - 4.19. They have configured Dataverse to require additional metadata when submitting a Dataset. RSpace doesn't know these fields are mandatory and submission fails:
Deposit failed: ERROR2020-06-09T10:58:26ZProcessing failedCouldn't update dataset edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException: Validation Failed: Producer Name is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Distributor Name is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Description Date is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Keyword Term is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]), Deposit Date is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ]).
This corresponds exactly to what are the required properties as sent by the Dataverse admin, that are not set by RSpace
If this list never changes, then RSpace could develop a solution where it reads a list of mandatory fields from a configuration file. But if it does change from time to time, it would be great if there was an API method in Dataverse to get a list of mandatory metadata fields .Then, a client could programmatically generate input fields for these properties so that the end-user could make a valid submission.
The text was updated successfully, but these errors were encountered: