Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Permalinks in metadata exporters #10768

Open
vera opened this issue Aug 12, 2024 · 5 comments · May be fixed by #10790
Open

Suggestion: Permalinks in metadata exporters #10768

vera opened this issue Aug 12, 2024 · 5 comments · May be fixed by #10790
Labels

Comments

@vera
Copy link
Contributor

vera commented Aug 12, 2024

Overview of the Suggestion/What inspired this idea?

Looking at the metadata export for datasets with permalink PIDs, I found that the treatment of permalinks is currently inconsistent. Some export formats use the permalink itself as the dataset identifier + within the citation string, others use a link of the format $DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK.

Here are relevant snippets from each export format:

DataCite:

    <identifier identifierType="DOI">$PERMALINK</identifier>

(wrong identifierType is already being discussed here: #10759)

OpenAire:

<identifier>$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK</identifier>

Schema.org:

"@id":"$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK",
"identifier":"$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK"

DDI:

<IDNo agency="perma">perma:$PERMALINK</IDNo>
...
<biblCit>$AUTHORS, $YEARS, "$TITLE", $DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK, Root, V1</biblCit>
...
<holdings URI="$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK"/>

DDI HTML codebook: same as DDI

Dublin Core:

<dcterms:identifier>$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK</dcterms:identifier>

JSON:

"persistentUrl":"$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK"
...
"datasetPersistentId":"perma:$PERMALINK"
...
"citation":"$AUTHORS, $YEAR, \"$TITLE\", $DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK, Root, V1"

OAI_ORE:

"@id": "$DATAVERSE_HOST/api/datasets/export?exporter=OAI_ORE&persistentId=$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK",
...
"ore:describes": {
  "@id": "http://localhost:8080/citation?persistentId=perma:https://clinicaltrials.gov/study/NCT00080262",
  ...
}

What existing behavior do you want changed?

We would like to suggest that

  1. the treatment of Permalink PIDs in all exporters should be consistent
  2. preferably the persistent identifier of the dataset should be simply the Permalink, as in the Datacite export or in the JSON export

Any brand new behavior do you want to add to Dataverse?

none

Any open or closed issues related to this suggestion?

maybe #10615 (refactoring of Datacite export code), not aware of any others

Are you thinking about creating a pull request for this issue?

yes, we would be interested in creating a PR :)

cc @johannes-darms

@vera vera added the Type: Suggestion an idea label Aug 12, 2024
@vera
Copy link
Contributor Author

vera commented Aug 12, 2024

This suggestion could also apply to the citation exports:

BibTeX:

@data{$PERMALINK_$YEAR,
author = {$AUTHORS},
publisher = {Root},
title = {{$TITLE}},
year = {$YEAR},
version = {V1},
doi = {$PERMALINK},
url = {$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK}
}

EndNote XML:

<urls><related-urls><url>$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK</url></related-urls></urls><electronic-resource-num>perma/$PERMALINK</electronic-resource-num>

(By the way, two additional issues in these two exports. I've opened an issue for them here: #10769)

RIS:

Provider: Root
Content: text/plain; charset="utf-8"
TY  - DATA
T1  - $TITLE
AU  - $AUTHORS
DO  - perma:$PERMALINK
ET  - V1
PY  - 2024
SE  - 2024-08-08 14:38:10.884
UR  - $DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK
PB  - Root
ER  - 

@pdurbin
Copy link
Member

pdurbin commented Aug 12, 2024

As of this writing there are three more exporters at https://github.com/gdcc/dataverse-exporters you might want to check:

  • Croissant
  • DDI-PDF
  • RO-Crate

@vera
Copy link
Contributor Author

vera commented Aug 13, 2024

Yes, good idea. I couldn't get the DDI PDF exporter to work (maybe it's user error, not sure. I opened an issue: gdcc/exporter-ddipdf#3), but here's the other two:

Croissant:

"url": "$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK",
...,
"citeAs": "@data{NCT00080262_2024,author = {$AUTHORS},publisher = {Root},title = {$TITLE},year = {2024},url = {$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK}}"

RO-Crate:

{
  "@id": "./",
  "@type": "Dataset",
  "identifier": "$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK",
  ...
}

(To get this running, I had to disable the failing test, but I see you already opened an issue for that gdcc/exporter-ro-crate#1)

I also just added OAI_ORE above, which was missing as well.

@pdurbin
Copy link
Member

pdurbin commented Aug 20, 2024

@vera thanks for doing all that checking.

"url": "$DATAVERSE_HOST/citation?persistentId=perma:$PERMALINK",

Sorry, I'm a little confused. Is the URL above ok? Does it work for you? If not, because this example is from the Croissant exporter, please open an issue at https://github.com/gdcc/exporter-croissant

@vera
Copy link
Contributor Author

vera commented Aug 21, 2024

Yes, I think after #10775 is merged, this URL works for us 👍 I will need to revisit this issue then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants