-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAI-PMH export in Datacite metadata format #4318
Comments
@LauraHuisintveld thanks for opening this issue! I believe #2917 is that one we're using to track the request to make more metadata fields available in the DataCite format we export so please feel free to leave comments there. Also, #3697 is about making DataCite available as a format under "Export Metadata" on the dataset page. At the moment I'm working on adding "Schema.org JSON-LD" to that list in #3700. The way the code is written, it's best to first make a new format available via "Export Metadata" and then make it possible to harvest that format via OAI-PMH. By "best" I mean that it's a way to deliver the software in smaller chunks rather than trying to do everything at once. Finally, #4257 feels related as says "has expanded its DataCite metadata to be compliant with the European OpenAIRE guidelines". I don't have my head fully wrapped around what that issue is about though. |
Adding a +1, yes, would like this too. |
Thanks @shlake for the +1 and for @LauraHuisintveld for creating the original issue. @jggautier - when you have a few, let's review and consolidate (or split up further? :)) the export related issues. As @pdurbin mentions there are some similar issues. |
Much of the metadata mapping to DataCite 3.1 and 4.1 is done in this working copy of the Dataverse 4.8+ Metadata Crosswalk. Thanks to those in this ticket and @pameyer for very valuable input so far. In columns H and I:
Outstanding questions:
|
@jggautier thanks for bringing this to backlog grooming 1/10/2018. My notes as you were speaking (please feel free to edit):
|
While discussing this issue in today we were wondering if we're talking about an installation of Dataverse harvesting from another installation of Dataverse or not. It looks like @LauraHuisintveld is talking about a non-Dataverse installation harvesting from her Dataverse installation. She said "We use the harvested records in another application" over at the following at https://groups.google.com/d/msg/dataverse-community/55cSbjBi10o/20Llv3OHAwAJ That is to say, export rather than export plus import should satisfy this issue, as we suspected. Export only is an easier task. |
@pdurbin: Yes, our use case is that a non-Dataverse installation is harvesting our Dataverse installation. |
@LauraHuisintveld ok, thanks. I'm curious if you're looked into harvesting using DDI or not. Like DataCite, it's a well specified XML standard. |
Required fields for mapping DataCite requires that "related work" metadata (e.g. related publication) includes information about how the work is related to the dataset, using terms from its relationType vocabulary. Discussion about DataCite's property relationType is in the github issue #2778. relationType is one of a few Dataverse metadata fields that can't be mapped and included in a valid DataCite xml document without including other metadata that Dataverse either doesn't collect or doesn't require. The fields are listed below, and I'm including more details in this DataCite 4.1 xml template, which I hope will help with implementing the Dataverse-DataCite 4.1. mapping. Dataverse fields mapped to DataCite properties that require sub-properties:
I recommend in the DataCite xml template that Dataverse include this metadata in the DataCite xml only when:
Require a value for one field if a value for another is entered In the future, I think some fields within groups of fields (compound fields) should be required if the groups' other fields have values. For example, if a depositor adds an authorIdentifier (like an ORCID ID number), the authorIdentifierScheme is a required field. This will be very helpful outside of mapping to DataCite; 16 numbers isn't helpful without knowing the numbers are an ORCID ID. There are some issues relating to Dataverse being able to require one field if another has a value, including #4072. Not sure if the use case I've described is technically different enough to get its own issue. Thoughts @scolapasta? :) |
According to @mfenner over at #2243 (comment) DataCite Schema 4.1 added an attribute to |
Here is the documentation for DataCite Schema 4.1: https://doi.org/10.5438/0014. |
Thanks @mfenner, for the comments about how DataCite determines if the author metadata it gets is a person or organization when the metadata doesn't say so, another good reason for knowing if an author is a person or organization (citation formatting), and the link to the 4.1 schema. @pdurbin I've been working under the impression that adding new metadata fields, like nameType, is out of scope for this issue. I'll include nameType in the XML template and the "working" crosswalk so it's considered for the issue about sending more metadata to DataCite (#2917). But:
|
The export side of DataCite seems to be coming in pull request #4664 by @abollini and I just left a comment about this at #3697 (comment) . This issue #4318 seems to be about OAI-PMH, however, and that's not being delivered but I can be more easily added once the export is in place. |
This was already mentioned in the OpenAIRE issue (#4257), but might be helpful to clarify here that OpenAIRE compliancy does involve OAI-PMH, and the pull request includes being able to harvest OpenAIRE's flavor of DataCite metadata. I think OpenAIRE compliancy can satisfy @LauraHuisintveld's request, but the status of that effort is still being worked out. But in Dataverse versions after 4.9.4 (not sure which version exactly), DataCite has been listed as a metadata format that's exportable over OAI-PMH: So I think we're closer to being able to export DataCite metadata over OAI-PMH. The problem is that while testing, I wasn't able to harvest Scholars Portal metadata in the DataCite format. When I tried using Demo Dataverse to harvest, the admin dashboard said that all records in the SP set failed to be harvested. And trying to look up the repository's records in the SP set in the DataCite format produces this error: A second, less technical, issue for resolving @LauraHuisintveld's request is how Dataverse metadata is being mapped to DataCite properties. Assuming that the DataCite metadata that you can export now from the latest Dataverse versions (for example, https://demo.dataverse.org/api/datasets/export?exporter=Datacite&persistentId=doi%3A10.5072/FK2/5DZKMW) is the same metadata made available over OAI-PMH, the depositor field isn't being mapped to any DataCite property. I think that's because what's in the export is just the metadata that Dataverse sends to DataCite when registering DOIs. @LauraHuisintveld, could you let us know what other Dataverse fields you need to be mapped to DataCite properties? (The metadata crosswalk you mentioned a while back should be accurate for dataset-level metadata. It doesn't indicate that file metadata - for files that get DOIs - is also sent in the dataset metadata.) |
@LauraHuisintveld have you had a chance to do any testing with the relatively new "Datacite" format? Here's a screenshot from https://demo.dataverse.org/oai?verb=ListMetadataFormats that shows it as one of the supported metadata formats for harvesting: My understanding is that you need your installation of Dataverse to act as a harvesting server (the screenshot above), but it's also possible to configure Dataverse to act as a harvesting client with the "Datacite" metadata format. Here's a screenshot @jggautier and I took yesterday when discussing #4257 (which is about an upcoming related format called OpenAIRE which is also based on DataCite): |
@jggautier @LauraHuisintveld - now that the OpenAIRE PR is merged, which may resolve this, I'm going to close this issue. If we need to revisit this, please feel free to reopen with some specific information about what's expected. |
We (DANS - DataverseNL) would like to harvest the Dataverse metadata in DataCite-format (DataCite 4.1 documentation) when more metadatafields will become available in the DataCite format.
We are currently harvesting in Dublin Core, but we experience that the mapping of the fields to Dublin Core is not always to our needs. (For example, 'depositor' is mapped to 'contributor'.)
See also the forum discussion here: https://groups.google.com/forum/#!topic/dataverse-community/55cSbjBi10o
The text was updated successfully, but these errors were encountered: