-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming exports and OAI_ORE and datacite exporters #5043
Comments
Hi @qqmyers. Is there any documentation I could look at for how Dataverse metadata is being mapped to the json-ld OAI-ORE map file? |
@jggautier - sorry for being slow. An updated citations.tsv file is not yet in the PR and I need to do some work since QDR has added/deleted fields relative to the base version. As is, the code does generate custom URIs that follow the tsv file/field name hierarchy so there's always a default. And the PR includes the update that allows an extra columns to be added to specify an namespace for the whole tsv and/or to assign individual URIs to fields (I can send an example if you want to test this part). I'll get an updated citations.tsv to you when I can, but I don't have a strong preference for what the URIs are so if you want to specify some/all before then, feel free. |
@jggautier - Draft mappings to community vocabularies are now in citations.tsv - the last column if the exist. Other terms in citations.tsv will be defined as "https://dataverse.org/schema/< In general, I don't have strong preferences for any of these mappings and changing them, or dropping them (so they become dataverse.org/schema terms) would be fine by me. The ones I did assign seem like fairly close matches, but often, while there were similar terms, the expected values were not necessarily the same (text versus structured), or the structure was different (e.g. Dublin core has description, but there's no indication that multiple descriptions with value and date fields are expected). A group discussion might find consensus on some of these but I'd suggest just dropping any mappings that raise concerns for now - using dataverse.org specific terms and allowing instances to map via tsv file changes. |
@qqmyers. Thanks for the info! I took a look at the mappings in the citation.tsv file and @scolapasta is looking at it, too. Creating a dataverse.org namespace for dataverse fields, and mapping those fields to fields in other vocabularies (like dublin core and schema.org) seems like a requirement for the json-ld OAI-ORE map file, but not the focus of your work to archive published datasets in DPN. So I don't want to hold up this PR with mapping suggestions that could be made later. You wrote that terms in any other tsv file currently default to /schema///. Does that mean they default to dataverse.org/schema terms as well? (I haven't had a chance yet to see what the json-ld OAI-ORE map file that this PR will create looks like.) Thanks again! |
@jggautier - the link with the DPN/Bagit part is that the OREmap file is submitted as part of the Bag, so it would be nice to have 'reasonable' mappings for a v1, even if they get updated later. For the other tsv files - I should have read my comment :-) My bracketed terms got dropped to create /schema///, but take a look at the edited comment above now to see the form of the default term names for any/all other blocks. I did create some example files early on posted on a wiki page - I'll see if I can update them so they match the latest code and then provide a link here. |
Ah I understand the comment now. Thanks! So in the citation.tsv, does a metadata field have its own URI in the dataverse.org schema if there is also a URI in either of the last two columns? |
The order of precedence is: a field's URI is
Note - I updated the citation.tsv file in the pull since I didn't have the block name in the MDblock URI. |
Ah, okay. Everything makes sense to me so far. I'll take myself off review. Thanks again! |
I committed a couple bug fixes and did some refactoring for the bagit generation /dpn PR. I also updated https://github.com/QualitativeDataRepository/dataverse/wiki/Data-and-Metadata-Packaging-for-Archiving |
Thanks a lot for all of this great documentation! |
Hi, @qqmyers. @scolapasta and I had a question about the last two columns in the metadata block TSVs. It looks like the first column has the URI, unless the field is a child field of a compound field; for those fields, like author name or author affiliation, the URI is in the second column. Would it be possible to use just one column? It doesn't seem like there would be cases where both columns have URIs for one field. |
@jggautier @scolapasta - sorry - just typos. There should only be one additional column for the fields. I updated the citation.tsv file in the commit (and added a header for the column added in the metadatablock and the 1 column for specific fields). |
@qqmyers Can you update this branch from develop? Version numbers have changed and I can't build and deploy. Thanks! |
@kcondon - merged... |
@qqmyers Thanks. Have applied db update and tsv file. When I click on an existing dataset or try to publish a new one I see 500 errors and associated server log errors: [2018-10-10T18:09:36.650-0400] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.util.BundleUtil] [tid: _ThreadID=53 _ThreadName=jk-connector(4)] [timeMillis: 1539209376650] [levelValue: 900] [[ [2018-10-10T18:09:36.651-0400] [glassfish 4.1] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.application] [tid: _ThreadID=53 _ThreadName=jk-connector(4)] [timeMillis: 1539209376651] [levelValue: 1000] [[ and: [2018-10-10T18:12:00.192-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.export.OAI_OREExporter] [tid: _ThreadID=50 _ThreadName=jk-connector(1)] [timeMillis: 1539209520192] [levelValue: 1000] [[ [2018-10-10T18:12:00.194-0400] [glassfish 4.1] [SEVERE] [] [] [tid: _ThreadID=50 _ThreadName=Thread-9] [timeMillis: 1539209520194] [levelValue: 1000] [[ |
@qqmyers I am out tomorrow but will be in Friday to continue testing your 3 prs. |
@kcondon -added the missing Bundle properties. |
As part of #4706, I created a class to create an json-ld OAI-ORE map file and added it as one of the metadata export options. I also created a minimal datacite.xml exporter that simple replicates what was being sent to DataCite. In doing this, since OAI_ORE maps can get large, I switched the export mechanism to stream responses.
(I'm aware of #3697, #4257, etc. but am not quite sure of their status. My guess is that the export from that effort could replace this one (as an export, and possible w.r.t. what is sent to DataCite, and what's included in the RDA-compliant Bag sent to DPN) once it's merged.)
The text was updated successfully, but these errors were encountered: