Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI-PMH export in Datacite metadata format #4318

Closed
LauraHuisintveld opened this issue Nov 29, 2017 · 16 comments
Closed

OAI-PMH export in Datacite metadata format #4318

LauraHuisintveld opened this issue Nov 29, 2017 · 16 comments

Comments

@LauraHuisintveld
Copy link

LauraHuisintveld commented Nov 29, 2017

We (DANS - DataverseNL) would like to harvest the Dataverse metadata in DataCite-format (DataCite 4.1 documentation) when more metadatafields will become available in the DataCite format.

We are currently harvesting in Dublin Core, but we experience that the mapping of the fields to Dublin Core is not always to our needs. (For example, 'depositor' is mapped to 'contributor'.)

See also the forum discussion here: https://groups.google.com/forum/#!topic/dataverse-community/55cSbjBi10o

@pdurbin
Copy link
Member

pdurbin commented Nov 29, 2017

@LauraHuisintveld thanks for opening this issue! I believe #2917 is that one we're using to track the request to make more metadata fields available in the DataCite format we export so please feel free to leave comments there.

Also, #3697 is about making DataCite available as a format under "Export Metadata" on the dataset page. At the moment I'm working on adding "Schema.org JSON-LD" to that list in #3700. The way the code is written, it's best to first make a new format available via "Export Metadata" and then make it possible to harvest that format via OAI-PMH. By "best" I mean that it's a way to deliver the software in smaller chunks rather than trying to do everything at once.

Finally, #4257 feels related as says "has expanded its DataCite metadata to be compliant with the European OpenAIRE guidelines". I don't have my head fully wrapped around what that issue is about though.

@shlake
Copy link
Contributor

shlake commented Nov 29, 2017

Adding a +1, yes, would like this too.

@djbrooke
Copy link
Contributor

Thanks @shlake for the +1 and for @LauraHuisintveld for creating the original issue.

@jggautier - when you have a few, let's review and consolidate (or split up further? :)) the export related issues. As @pdurbin mentions there are some similar issues.

@jggautier
Copy link
Contributor

jggautier commented Jan 10, 2018

Much of the metadata mapping to DataCite 3.1 and 4.1 is done in this working copy of the Dataverse 4.8+ Metadata Crosswalk. Thanks to those in this ticket and @pameyer for very valuable input so far.

In columns H and I:

  • Metadata not mapped to DataCite fields are highlighted in red
  • The metadata mapped to DataCite fields, which is sent to DataCite, is not highlighted

Outstanding questions:

  • Can we use DataCite 4.1, instead of 3.1, for an OAI-PMH harvesting format?
  • Should we include, as part of this issue, updating the DataCite metadata we send to DataCite, or should that be tracked in another issue

@djbrooke
Copy link
Contributor

djbrooke commented Jan 10, 2018

@jggautier thanks for bringing this to backlog grooming 1/10/2018. My notes as you were speaking (please feel free to edit):

@pdurbin
Copy link
Member

pdurbin commented Jan 11, 2018

While discussing this issue in today we were wondering if we're talking about an installation of Dataverse harvesting from another installation of Dataverse or not. It looks like @LauraHuisintveld is talking about a non-Dataverse installation harvesting from her Dataverse installation. She said "We use the harvested records in another application" over at the following at https://groups.google.com/d/msg/dataverse-community/55cSbjBi10o/20Llv3OHAwAJ

That is to say, export rather than export plus import should satisfy this issue, as we suspected. Export only is an easier task.

@LauraHuisintveld
Copy link
Author

@pdurbin: Yes, our use case is that a non-Dataverse installation is harvesting our Dataverse installation.

@pdurbin
Copy link
Member

pdurbin commented Jan 11, 2018

@LauraHuisintveld ok, thanks. I'm curious if you're looked into harvesting using DDI or not. Like DataCite, it's a well specified XML standard.

@jggautier
Copy link
Contributor

jggautier commented Jan 16, 2018

Required fields for mapping

DataCite requires that "related work" metadata (e.g. related publication) includes information about how the work is related to the dataset, using terms from its relationType vocabulary. Discussion about DataCite's property relationType is in the github issue #2778.

relationType is one of a few Dataverse metadata fields that can't be mapped and included in a valid DataCite xml document without including other metadata that Dataverse either doesn't collect or doesn't require. The fields are listed below, and I'm including more details in this DataCite 4.1 xml template, which I hope will help with implementing the Dataverse-DataCite 4.1. mapping.

Dataverse fields mapped to DataCite properties that require sub-properties:

  • authorIdentifier
  • contributor
  • alternateIdentifier
  • otherID
  • relatedPublication

I recommend in the DataCite xml template that Dataverse include this metadata in the DataCite xml only when:

  • the depositor adds the required metadata (for example, map Dataverse's authorIdentifier to DataCite's nameIdentifier only if there's an authorIdentifierScheme)
  • there's an appropriate default value for a sub-property (for example, if there's a contributorName but no contributorType, use "Other" for the contributorType)

Require a value for one field if a value for another is entered

In the future, I think some fields within groups of fields (compound fields) should be required if the groups' other fields have values. For example, if a depositor adds an authorIdentifier (like an ORCID ID number), the authorIdentifierScheme is a required field. This will be very helpful outside of mapping to DataCite; 16 numbers isn't helpful without knowing the numbers are an ORCID ID.

There are some issues relating to Dataverse being able to require one field if another has a value, including #4072. Not sure if the use case I've described is technically different enough to get its own issue. Thoughts @scolapasta? :)

@pdurbin
Copy link
Member

pdurbin commented Jan 18, 2018

According to @mfenner over at #2243 (comment) DataCite Schema 4.1 added an attribute to creator and contributor: nameType (controlled list of either personal or organizational). Sounds nice.

@mfenner
Copy link

mfenner commented Jan 18, 2018

Here is the documentation for DataCite Schema 4.1: https://doi.org/10.5438/0014.

@jggautier
Copy link
Contributor

jggautier commented Jan 18, 2018

Thanks @mfenner, for the comments about how DataCite determines if the author metadata it gets is a person or organization when the metadata doesn't say so, another good reason for knowing if an author is a person or organization (citation formatting), and the link to the 4.1 schema.

@pdurbin I've been working under the impression that adding new metadata fields, like nameType, is out of scope for this issue. I'll include nameType in the XML template and the "working" crosswalk so it's considered for the issue about sending more metadata to DataCite (#2917).

But:

  • Should adding new Dataverse metadata fields be considered for this issue?
  • Should we consider for this issue the idea and methods DataCite uses (see mfenner's comment) for determining if the authorName value is a person or organization?

@pdurbin
Copy link
Member

pdurbin commented May 17, 2018

The export side of DataCite seems to be coming in pull request #4664 by @abollini and I just left a comment about this at #3697 (comment) . This issue #4318 seems to be about OAI-PMH, however, and that's not being delivered but I can be more easily added once the export is in place.

@jggautier
Copy link
Contributor

jggautier commented Feb 21, 2019

This was already mentioned in the OpenAIRE issue (#4257), but might be helpful to clarify here that OpenAIRE compliancy does involve OAI-PMH, and the pull request includes being able to harvest OpenAIRE's flavor of DataCite metadata. I think OpenAIRE compliancy can satisfy @LauraHuisintveld's request, but the status of that effort is still being worked out.

But in Dataverse versions after 4.9.4 (not sure which version exactly), DataCite has been listed as a metadata format that's exportable over OAI-PMH:
https://dataverse.harvard.edu/oai?verb=ListMetadataFormats
https://dataverse.scholarsportal.info/oai?verb=ListMetadataFormats

So I think we're closer to being able to export DataCite metadata over OAI-PMH. The problem is that while testing, I wasn't able to harvest Scholars Portal metadata in the DataCite format. When I tried using Demo Dataverse to harvest, the admin dashboard said that all records in the SP set failed to be harvested. And trying to look up the repository's records in the SP set in the DataCite format produces this error:

screen shot 2019-02-21 at 12 05 45 pm

A second, less technical, issue for resolving @LauraHuisintveld's request is how Dataverse metadata is being mapped to DataCite properties. Assuming that the DataCite metadata that you can export now from the latest Dataverse versions (for example, https://demo.dataverse.org/api/datasets/export?exporter=Datacite&persistentId=doi%3A10.5072/FK2/5DZKMW) is the same metadata made available over OAI-PMH, the depositor field isn't being mapped to any DataCite property. I think that's because what's in the export is just the metadata that Dataverse sends to DataCite when registering DOIs.

@LauraHuisintveld, could you let us know what other Dataverse fields you need to be mapped to DataCite properties? (The metadata crosswalk you mentioned a while back should be accurate for dataset-level metadata. It doesn't indicate that file metadata - for files that get DOIs - is also sent in the dataset metadata.)

@pdurbin
Copy link
Member

pdurbin commented Apr 26, 2019

@LauraHuisintveld have you had a chance to do any testing with the relatively new "Datacite" format? Here's a screenshot from https://demo.dataverse.org/oai?verb=ListMetadataFormats that shows it as one of the supported metadata formats for harvesting:

server

My understanding is that you need your installation of Dataverse to act as a harvesting server (the screenshot above), but it's also possible to configure Dataverse to act as a harvesting client with the "Datacite" metadata format. Here's a screenshot @jggautier and I took yesterday when discussing #4257 (which is about an upcoming related format called OpenAIRE which is also based on DataCite):

client

@djbrooke
Copy link
Contributor

djbrooke commented May 9, 2019

@jggautier @LauraHuisintveld - now that the OpenAIRE PR is merged, which may resolve this, I'm going to close this issue. If we need to revisit this, please feel free to reopen with some specific information about what's expected.

@djbrooke djbrooke closed this as completed May 9, 2019
@djbrooke djbrooke modified the milestone: 4.15 Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants