Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Dataverse metadata fields seem not to be indexed correctly by DataCite #7072

Open
philippconzett opened this issue Jul 10, 2020 · 12 comments
Labels
Feature: Metadata GREI 3 Search and Browse Type: Feature a feature request User Role: Curator Curates and reviews datasets, manages permissions

Comments

@philippconzett
Copy link
Contributor

Recently, I minted a DOI for a sub-dataverse / collection within DataverseNO using the DataCite Fabrica service (https://doi.datacite.org/). Accidentally, I discovered that some Dataverse metadata fields seem not to be harvested/indexed correctly by DataCite. Here is how I discovered this issue: In the DOI section of DataCite Fabrica, I selected the DataCite metadata record of an existing dataset which was published in DataverseNO. I clicked the Update DOI (Form) button to see the details of the metadata record. Scrolling through the DataCite metadata record and comparing it with the metadata record of the corresponding dataset in DataverseNO, I noted the issues below. I guess they are due to a) issues in Dataverse, or b) issues in DataCite, or c) a combination of (a) and (b). In the case of (a), I suggest that there be opened separate GitHub issues for each issue.

REQUIRED PROPERTIES
Affiliation: According to the help text, Affiliation names and identifiers are provided by the Research Organization Registry (ROR). I suggest that affiliation fields and other Dataverse metadata fields (potentially) containing the name of a research organization also fetch their values from ROR.

Resource Type General: The default Resource Type General for resources published in a Dataverse repository is Dataset. I suggest to introduce two more types. (1) The first one is Collection, which may be applied to (sub-)dataverses. Currently, it is possible to mint a DOI for a sub-dataverse, but only manually in DataCite Fabrica. I suggest that this feature should also be a built-in option when publishing a dataverse. (2) The second Resource Type we need is File (or Part of Dataset); see existing GitHub issue #5086.

RECOMMENDED PROPERTIES
Subjects: No values are registered in this field. In a recent blog post, DataCite withes that they are using the OECD Fields of Science classification, which according to them is the most widely used generic classification scheme. The Dataverse community has previously discussed other vocabularies, including FAST (see this Dataverse Google Group post). Given the DataCite recommendations, I suggest that Dataverse goes for the OECD classification. I also suggest that once the OECD classification is adopted, there should be created a script that replaces the Subject values in existing datasets with corresponding OECD values.

Contributors: Here, I'd expect to find the values from the Dataverse Contributor field, but I only see two values: Contact person and Producer, whereas in the DataverseNO metadata record of the corresponding dataset there are two Contributor entries: Data Collector and Data Curator. Also, DataCite supports a Name Identifier, which "uniquely identifies an individual or legal entity, according to various schemas, e.g. ORCID, ROR or ISNI". I suggest, that Dataverse also introduces this support. See my comment above about ROR.

Geolocation: No values are registered in this field, whereas in the dataset in DataverseNO, both Geographic Coverage (Country = Norway) and Geographic Bounding Box (coordinates for Norway) are provided.

OPTIONAL PROPERTIES
Language: No values are registered in this field, whereas in the dataset in DataverseNO, the field Language contains the value English.

Rights: No values are registered in this field, whereas in the dataset in DataverseNO, default CC0 is selected / left unchanged.

Version: No values are registered in this field, whereas the current version of the dataset in DataverseNO is V2.

Funding References: No values are registered in this field, whereas the corresponding dataset in DataverseNO has two entries in the field Grant Information.

@jggautier
Copy link
Contributor

Some of this is related to #5889

@mheppler
Copy link
Contributor

Related? Silent publishing failure when not all fields required by Datacite are present #7551

@valentinapasquale
Copy link

Hello @philippconzett, hello everybody,
do you know if there is any plan (or open issue) about adopting the OECD Fields of Science classification as controlled vocabulary in the subject field of the citation metadata block?
Thanks for the help!

@qqmyers
Copy link
Member

qqmyers commented Mar 19, 2021 via email

@pdurbin pdurbin changed the title Some Dataverse metadata fields seem not to be harvested/indexed correctly by DataCite Some Dataverse metadata fields seem not to be indexed correctly by DataCite Apr 12, 2022
@pdurbin pdurbin added Type: Feature a feature request User Role: Curator Curates and reviews datasets, manages permissions labels Oct 9, 2023
@cmbz
Copy link

cmbz commented Jan 30, 2024

2024/01/30
@philippconzett are you still encountering this problem?

@pdurbin
Copy link
Member

pdurbin commented Jan 31, 2024

@cmbz I'll let @philippconzett speak for himself but I'd say "send more data to DataCite" has broad support across the community.

For example, Philipp mentions rights above. The DataCite Commons entry for Harvard Dataverse shows how we don't send rights/license data at all:

Screenshot 2024-01-30 at 8 56 54 PM

He also mentions funding. I'm pretty sure the NIH would like to know which datasets they have funded. If I'm reading the API output right, Dryad has told DataCite about 264 datasets funded by the NIH. Because Dataverse doesn't send any funding information to DataCite, an equivalent search for Harvard Dataverse datasets funded by the NIH is zero. I got these API calls from slide 5 of a presentation by Matt Buys. See also some Slack discussion.

Anyway, that's just two examples. We already have some issues going for funding. I'm not sure about rights/licensing. Like Philipp suggested above, perhaps separate issues is the way to go. I'm pretty sure it's all GREI-related, given that DataCite is a full member.

@cmbz
Copy link

cmbz commented Jan 31, 2024

@pdurbin Right! I was thinking less generically "send more metadata to DataCite", which we plan to address substantively in GREI years 3 and 4 (as you mentioned), and more specific metadata fields that we could prioritize in the scope of that planned work. Since the issue is several years old, I wasn't certain if some of these elements had already been addressed.

@philippconzett
Copy link
Contributor Author

Thanks, @cmbz and @pdurbin. I agree with Phil that solving these issues would be of high value for many if not all of our community members, since delivering complete and correct metadata to DataCite is at the core of making data findable.

@pdurbin
Copy link
Member

pdurbin commented Jan 31, 2024

To me it would make sense to create a few issues about the planned work before closing this one. That way people who are interested in these features can follow the new issues.

@cmbz
Copy link

cmbz commented Jan 31, 2024

@pdurbin and @philippconzett Right. Sorry for the confusion. I wasn't planning to close any issues without discussion. Just working to gather all outstanding to-do items on this topic into the GREI epics that are being defined so the work can be defined, planned, and worked on during Years 3 & 4.

@cmbz cmbz removed the GREI Year 3 Year 3 GREI task label May 22, 2024
@DieuwertjeBloemen DieuwertjeBloemen moved this to Medium priority in KU Leuven RDR Jul 10, 2024
@philippconzett philippconzett moved this to High priority in DataverseNO Jul 10, 2024
@DS-INRAE DS-INRAE moved this to ⚠️ Needed/Important in Recherche Data Gouv Jul 10, 2024
@jeromeroucou
Copy link
Contributor

jeromeroucou commented Oct 17, 2024

We have also noticed in the Recherche Data Gouv repository that contributors are not present in the metadata viewable in Datacite, as indicated by @philippconzett for DataverseNO.
However, we have more values than Data Collector and Data Curator. We taken all the values from Datacite's controlled list of contributorType. We also have a metadata not available on Datacite: Metadata author.

In the case of metadata not supported by Datacite, would it be better to indicate nothing, or to replace the value with Other?

@qqmyers
Copy link
Member

qqmyers commented Oct 17, 2024

FYI: #10632 does include contributor information in what is sent to DataCite. However, the code doesn't currently check for contributor types that are not in DataCite's controlled list and I expect the code as it is now will fail when submitting records to DataCite when a contributor type that is not in the DataCite list is used. Such a check could probably be added. I'd suggest that any types not in DataCite's list get mapped to Other rather than being dropped (contributor type is required in their 4.5 schema, so dropping would mean dropping the contributor overall and not just leaving the type blank).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata GREI 3 Search and Browse Type: Feature a feature request User Role: Curator Curates and reviews datasets, manages permissions
Projects
Status: High priority
Status: No status
Status: Medium priority
Status: ⚠️ Needed/Important
Development

No branches or pull requests

8 participants