-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataciteXML changes Plus RelationType field #10632
DataciteXML changes Plus RelationType field #10632
Conversation
trying to avoid a separate tx boundary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More thoughts on the docs.
|
||
Additional metadata, including metadata about Related Publications is now being sent to DataCite when DOIs are registered and published and is available in the DataCite XML export. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed. The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. | ||
|
||
For details see https://github.com/IQSS/dataverse/pull/10632 and https://github.com/IQSS/dataverse/pull/10615 and the [design document](https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing) referenced there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real meat of what's changing is squirreled away in a Google doc. This doesn't quite sit right with me. 🤔 This is the stuff users might like to know. And the stuff QA will test against.
However, our release notes tend to get long and I'm not sure the details should be here either.
The more I think about it... I'd prefer to have the Google doc copied and pasted here into the release notes. Git is a much better way to preserve this information. And it keeps the info with the pull request.
I'm open to other ideas, of course. Perhaps a new changelog in the guides? Or throw it in the API changelog? A separate text file linked from the release notes and/or the guides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to focus on the 40+ changes, many of which are only relevant if you're comparing old, new, and OpenAIRE closely (and are really closer to per-commit changes we usually make). I've added some additional detail to the release note to try and give more of a sense of the scope of the change (v4.5 schema, files, license/terms info, PIDs, ...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this information should be in git but I give up.
Thanks for the additional information in the release note. It does help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments on code.
src/main/java/edu/harvard/iq/dataverse/pidproviders/PidProviderFactoryBean.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/pidproviders/doi/XmlMetadataTemplate.java
Show resolved
Hide resolved
…itativeDataRepository/dataverse.git into datacite_plus_relPubRelType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API tests are passing. This is a lot to review and QA but I'm happy enough with how the code and docs look. This will be a great feature. People have been asking us to send more metadata to DataCite for years. Approved.
I re-tested some more today, since the last changes were made this am. I am satisfied with the PR and ready to merge. But please stop me if you're thinking of making more changes. |
@scolapasta and @jggautier please review and update/close linked issues in PR body. Thanks! |
Hi @cmbz. I haven't been able to review and update/close linked issues in PR body since you asked last month because we've been planning for the UX WG to work on this. So I think it's best if I help review and update/close linked issues in PR body as part of the UX WG's design sprint for improving descriptions of resources related to deposits, planned for after the current design sprint that's about improving descriptions of people and organizations related to deposits. |
Sounds good to me @jggautier. Pls ping me in Slack on these. They're easy to miss in the onslaught of GitHub notifications. Thanks! |
* missing empty watermark entry * fix capitalization * Changed: dataverse image_url Solr property set on SearchServiceBean * Changed: do not modify existing JSF logic * remove unused imports IQSS#10517 * add test to assert capitalizataion of Dataset and Software IQSS#10517 * add details to error messages (IQSS#10813) * Fix addDataverse expected request body structure (IQSS#10802) * Fixed: MetadataBlockServiceBean to check for not excluded fields in input levels * Changed: using queries for obtaining dataset field types based on displaying conditions * Refactor: json printer method for MetadataBlock * Added: IT test case for list metadata blocks testing field with include=false and displayOnCreate=true property * Fixed: removed condition in MetadataBlockServiceBean * Added: release notes for IQSS#10741 * Fixed: displayOnCreate query logic * Fixed: excluding conditionally required fields when display-on-create is true * Fixed: query predicate for required-in-dataverse field condition * Fixed: addDataverse API facetIds field json structure * Added: docs IQSS#10800 * A one line fix for IQSS#10821 - ? (IQSS#10823) * Add thumbnail for featured dataverses (IQSS#10433) * Add thumbnail for futured dataverses * Add documentation * Release note snippet * New flyway namming * Update doc/release-notes/10433-release-notes.md Co-authored-by: Philip Durbin <[email protected]> * Release note snippet update Add new recommandations (HTML preview + "for more information ...") * Update SQL file name after 6.2 release * renamed sql file --------- Co-authored-by: Philip Durbin <[email protected]> Co-authored-by: Ludovic DANIEL <[email protected]> Co-authored-by: Philip Durbin <[email protected]> * bump sql script version IQSS#10517 * JDD Metrics: Label KO IQSS#10123 (IQSS#10124) * remove parentheses * Correction of the parenthesis display * conditional INSERT of dataset type IQSS#10517 * Add logic to suppress query tool display for non-public files. * typo * fix test * fix labels when cvoc is used * doc tweaks for versioned base images: making releases IQSS#10827 * iterate on "supported image tags" section IQSS#10827 * Added: setting imageUrl in SearchServiceBean for datasets and files * simplify now that everything is inside the try * update tests - added one field in citation block * reworked controlled vocab language keys * fixing key to lowercase * fixing key to lowercase * release note * undo changes * Update doc/release-notes/10810-search-api-payload-extensions.md Co-authored-by: Philip Durbin <[email protected]> * Added: note about upcoming change to image_url field in docs * fixing language list * fixing language list * fixing language list * fixing language list * fixing language list * fixing language list * changes per review comments * changes per review comments * support no pubIdType for URLs * direct people to the log for failures - they aren't in the response * bug - the _target url isn't being set elsewhere * avoid failing when the entity is null for error statuses * don't update unpublished files - no need and it will fail the updateIdentifier call is checking for the findable metadata which is not available before publication. (We don't update DataCite after dataset edits, so unpublished datasets don't go through here, but unpublished files on published datasets would hit this code) * lower logging, add null check on relatedIdentifier * Change to use POST for all * Documentation and updated release note * changes per review comments * test fix - number of fields * Remap oai_dc fields dc:type and dc:date (IQSS#10737) * Remap oai_dc fields dc:type, dc:date, and dc:rights IQSS#8129. The `oai_dc` export and harvesting format has had the following fields remapped: - dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". - dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped only to the field "Publication Date". - dc:rights was not mapped to anything. Now it is mapped (when available) to terms of use, restrictions, and license. * add tests for export and citation date IQSS#8129 * map dc:date to pub date or field for citation date IQSS#8129 * back out of any changes to dc:rights IQSS#8129 * remove OAI-PMH changes from API changelog (also in release note) IQSS#8129 * tweak release note, mention backward incompatibility, reexport IQSS#8129 * update release note * check for ROR in grantAgency field too * 10527 404static.xhtml has an old date in the footer (2023 is hard-coded) and update URLs (IQSS#10535) * Javascript updates the year automatically and the URLs are customised. * add taps * only change year and and all the other errors are still there. * adopt using CDI, fix funderIdentifier element per schema * datasetTypes test fix * release note/changelog changes * chore(ct): update base image wait4x to 2.14.2 for stdlib update * chore(ct): update base image wait4x to 2.14.2 for stdlib update IQSS#10844 * don't send contributors w/o contributorType * relatedIdentifierType is required * flip to prefer identifier over url seeing cases at QDR where the type is DOI, the identifier is the doi and the URL is a non-DOI reference (e.g. at pubmed). * Handle case where type is set but there's no identifier * map non-standard contributors to Other, remove unused imports * Treat missing contrib type as Other * chore(ci): delete duplicate action after renaming * docs(ct): update base image tag policy from latest discussion IQSS#10827 As discussed during the 2024-09-12 containerization working group meeting (see ct.gdcc.io) and on https://dataverse.zulipchat.com/#narrow/stream/375812-containers/topic/change.20version.20scheme.20base.20image.3F/near/469884104 * style,docs(ct): minor tweaks to base image policy as per @pdurbin * feat(ct): switch latest to unstable in base image flow As per latest discussion, we want to keep the unstable tag around. It shall still point to the latest from develop as it has been done before our revised tagging policy. Latest will be used for production images, much more aligned with the Bitnami policy * style,docs(ct): minor tweaks to base image policy as per @pdurbin * fix(ct): remove auth for revision action Maybe we won't run into a rate limit for now, as the limit of 180/s * 100 = 18000 tags/s seems to leave us some headroom for now. * fix(ct): try to avoid shell substitution in base image flow We might need more backslashes to avoid telling the shell to replace what we want to be a maven property * fix(ct): avoid shell substitutions going awry for base image release tag Instead of using a Maven reference and dealing with escaping of $ chars, override the suffix with an empty string * add trailing / after hostname for perma base-url IQSS#10775 * ci(ct): remove out of scope actions for IQSS#10478 We will deal with shipping the updated application container images separately as part of issue IQSS#10618. Adding some comments about why some stuff is still around. * ci(ct): reorganise tags for develop branch IQSS#10478 Now adding the "upcoming" tag during the develop branch run. Also some reorganizing to ship the tag options using outputs, not env vars. Moving the common Maven option to enable tag overrides to the command instead of the options building. * chore(parent): remove OSS plugin snapshot repo after upgrade to DMP 0.45 * build(ct): make application image use new base image flexi stack IQSS#10478 * style(ct): remove stale comment from base module pom * chore(ct): add comment about apt sec updates detection via list hack For now, we stick to not alter the image more than necessary. Only packages we installed will be upgraded, as these are not part of the normal Java base images. The Java base images receive regular updates and undergo testing. It might be unwise to just install all the security updates we could get. Leaving the option here for later saves the trouble to dig up the solution again. * ci(ct): use new setup-maven action in base image push workflow Simplify setup following DRY principle * ci(ct): replace logic in base image push workflow Using the same actions and steps as done in the maintenance action should work for this workflow in case of a push event, too. * test(ct): temporarily set dev branch in base image push flow to feature branch * ci(ct): use setup-maven action in app image push workflow to simplify setup Following DRY principle, reuse the steps defined * test(ct): temporarily enable app image push flow execution in all forks * avoid spurious log warning for others e.g. isbn these were going through the default check for URLs and failing (not a url) leading to a warning. The new code should try URL parsing for URLs, try PID and URL parsing for ones with no type specified, and send the rest of the identifiers w/o any additional (optional) attributes. * update doc * ci(ct): use an optional base image ref for app image push flow We need to transfer the determined base image name we might have just built from the calling workflow into this flow. As we provide a default value, this is picked up for pull_requests. * ci(setup-maven): try to auto-detect git ref It's not so easy to determine the right git ref for different scenarios like PR, etc. Unless explicitly given a ref, try to autodetect the right one to go with. * chore(ct): add notes in flows about adding a path filter We want to avoid duplicate runs which might trigger race conditions for image shipments. * chore(ct): add note about missing triggers for base push flow This is out of scope for IQSS#10827, but should be addressed at a later point to avoid duplicated runs with potential race conditions. Also it enables proper rebuilds for preview images when someone is just trying to create a base image change, but which should obviously be tested full chain. * fix,ci(ct): don't trigger the base push flow for backports Obviously these are meant for the maintenance workflow, not the push flow! * ci(ct): trigger app flow from base push flow We detect the tag we have been using in a finalizing step to hand a proper base image ref to the app image workflow to make it work on the images we just pushed to the registry. * changes per review * Apply suggestions from code review Co-authored-by: Philip Durbin <[email protected]> * cleaner formatting * minor doc tweak IQSS#10632 * standardize image url * No longer needed with use of CDI.current() in XMLMetadataTemplate * no longer used and CrossRef ended up using it's own. * add more info about the scope of changes. * doc changes * doc changes * ci(ct): reshape maintenance workflow into external matrix script Unfortunately, matrix jobs logs and outputs cannot be aggregated in Github Actions. The only way to work around the limitations of GHA is by using a custom build script that create a similar matrix like experience. This commit introduces these scripts, probably also making some custom actions we added obsolete. * ci(ct): remove obsolete actions for revisions and parent image changes detection * ci(ct): re-enable forced build for maintenance workflow * doc changes * doc changes * ci(ct): add outputs to maintenance matrix job Can be picked up by other jobs, e.g. to create textblocks for docs or a job matrix. * ci(ct): remove draft of building app images in maintenance matrix job * style,ci(ct): reword the maintenance build workflow name * ci(ct): make the maintenance workflow push the hub description for the base image IQSS#10478 * style(ct): fix simple typo in base image README * fix(ct): remove bug from package upgrade detection in maintenance workflow We did not correctly compare the status code of the grep command, breaking the update detection * update query per review comments to handle all cases * docs,style(ct): small rewording about immutable tags for base image * feat,ci(ct): add immutable tags to list of base image tags in maintenance job As discussed during community meeting on 2024-09-19. * style,ci(ct): add some more verbosity about progress in maintenance job * refactor,ci(ct): finishing touches for IQSS#10478 Re-enable and change everything necessary to reference the upstream IQSS context as of now. * add docs for disable-dataset-thumbnail-autoselect IQSS#10819 IQSS#10820 * create 6.4 release notes and add about half the updates IQSS#10853 * add second half of snippets IQSS#10853 * fix,ci(ct): only add base image Maven option when the input is defined Without this in case of the pull_request event the input is null and the build fails because we have base image defined at all. Simply not adding the option if the input is undefined means we stick to what is defined within the POM. * add highlights and upgrade steps IQSS#10853 * docs(ct): add release note for maintenance workflow IQSS#10478 * style(ct): add comment explaining what a flavor is in base image Maven props Co-authored-by: Philip Durbin <[email protected]> * A potential simple fix for IQSS#10667 ? * Update 6.4-release-notes.md corrected the schema.xml instructions * add blurb for tagged base images IQSS#10853 * doc how to handle develop into develop PRs IQSS#9508 * remove extra line * dont return image_url if there is none * Fix NPE using CVOC * set dataset type before registering pid (which needs the type) * add release note * adding fix from review comment * remove tabs to make reviewdog happy. woof! IQSS#10623 * globus doc tweaks IQSS#10623 * add more and better DataCite export IQSS#10853 * add blur for listing feature flags IQSS#10853 * add CVOC bug fix IQSS#10853 -6.4-release-notes * update image_url IQSS#10853 * Updated the docs to reflect the new name of a JVM option (IQSS#10623) * I fixed anchor links IQSS#10876 (IQSS#10877) * improve release note IQSS#10623 * add globus async IQSS#10853 * add cvoc update IQSS#10853 * add new globus settings under settings section IQSS#10853 * various tweaks IQSS#10853 * typo IQSS#10853 * put features before bug fixes IQSS#10853 * reword * datacite title * croissant update IQSS#10853 * bump version to 6.4 IQSS#10852 (IQSS#10871) * displayOnCreate set to true for depositor and dateOfDeposit in Citation metadata block (IQSS#10884) * Changed: displayOnCreate set to true for depositor and dateOfDeposit in citation.tsv * Changed: MetadataBlocksIT test assertion for new total number of displayOnCreate fields * Added: release notes for IQSS#10850 * Added: minor tweak to release notes * IQSS#10853 fix typo version number * Add release note change for fields depositor and dateOfDeposit in the citation.tsv * remove old release note * formatting fix fixed formatting of the shell block in the upgrade instruction * tweak depositor and dateOfDeposit IQSS#10853 * fixed update-fields.sh url (it had "9.4.1" in it; and we probably don't want to get it from the master branch either) * reindex instruction * removed a superfluous command line * temp dir cleanup * typo * docs: update release notes from IQSS#10343 * tweaks IQSS#10343 * Upgrade to upstream version 6.4 * Merge upstream v6.4 into branch properties * Sync with upstream * Fix merge of properties * bugfix: metadataFragment.xhtml * fix relationType display value bug --------- Co-authored-by: Jim Myers <[email protected]> Co-authored-by: GPortas <[email protected]> Co-authored-by: Philip Durbin <[email protected]> Co-authored-by: landreev <[email protected]> Co-authored-by: jeromeroucou <[email protected]> Co-authored-by: Philip Durbin <[email protected]> Co-authored-by: Ludovic DANIEL <[email protected]> Co-authored-by: sbondka <[email protected]> Co-authored-by: Stephen Kraffmiller <[email protected]> Co-authored-by: Steven Winship <[email protected]> Co-authored-by: Benedikt Kruse <[email protected]> Co-authored-by: Oliver Bertuch <[email protected]> Co-authored-by: Oliver Bertuch <[email protected]> Co-authored-by: qqmyers <[email protected]> Co-authored-by: paulboon <[email protected]> Co-authored-by: ofahimIQSS <[email protected]> Co-authored-by: Florian Fritze <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
شكرا
What this PR does / why we need it: This PR adds a RelationType child field to the related publication parent field and uses it to provide a RelationType in the OpenAire and DataCite XML exports, DataCite XML sent to dataset (and the JSON and OAI_ORE exports which include all fields). It builds upon #10615 and should be reviewed/QA'd after that (or we can create a PR against that branch to more easily see the changes just to add a RelationType.
Which issue(s) this PR closes:
Relates to:
Special notes for your reviewer:
Suggestions on how to test this: Nominally the new XMLTemplateTest (and all others) should pass and it should be possible to publish datasets with any/all metadata using a DataCite test account. The log shouldn't contain any issues where DataCite responds with a 422 and indicates that the XML doesn't comply with their 4.5 schema. There should be lots of additional metadata for related publications, author entries should include ORCID info if provided and affiliations and GrantNumberAgency should have ROR info if a ROR rather than plain text was entered. Typos like having a related publication with id type doi and either no or non-DOI entries for the identifier and url should result in a log message and that particular related Publication not getting included in the XML, but otherwise should not cause a failure to update the XML. Etc.
FWIW: I have been able to run this on all the QDR production data and have everything update OK (though we have a few typos in the metadata to fix).
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Yes, it adds "Relation Type" to "Related Publication":
Is there a release notes update needed for this change?: included.
Additional documentation: As noted in the release note, there's a long doc listing ~all of the intended changes from the previous version - see https://docs.google.com/document/d/1JzDo9UOIy9dVvaHvtIbOI8tFU6bWdfDfuQvWWpC0tkA/edit?usp=sharing.
Changes to the guides can be previewed at https://dataverse-guide--10632.org.readthedocs.build/en/10632/admin/dataverses-datasets.html#send-metadata-to-pid-provider