Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Relation Type" to related publication metadata fields to send DataCite related publication metadata #2778

Closed
posixeleni opened this issue Nov 30, 2015 · 51 comments
Labels
Feature: Metadata Type: Suggestion an idea User Role: Curator Curates and reviews datasets, manages permissions
Milestone

Comments

@posixeleni
Copy link
Contributor

comment from @bencomp in #2774

the DataCite metadata schema v3.1, if a related identifier is included (e.g. for publications that build on the dataset), a relation type must be given.

@augustfly
Copy link

commenting for reasons of tracking this thread and opinion, but jumping on board with DOI/DataCite relation types is something Dataverse really should do. the controlled vocab and use cases are there for the reuse. i'm sure there are UI issues downstream from implementing it though.

@jggautier
Copy link
Contributor

In the article "Pre-Metadata Counseling: Putting the DataCite relationType Attribute into Action" (PDF), the team behind Illinois Data Bank write about using DataCite's relationType terms. They decide in their longer "overly honest" article about the repository's development that the terms are too difficult to use (hard to define, too much overlap) to expect depositors to apply them consistently, so their curators apply them after dataset publication and they use only six of the 25.

This makes me think it would be a good idea to lessen the confusion when this is implemented by:

  • Starting with a limited number of relationType terms available in the deposit form
  • Letting depositors choose multiple relationTypes (considering the overlap in terms)

@pdurbin
Copy link
Member

pdurbin commented Jan 11, 2018

they use only six of the 25

The six seem to be "Article, Code, Dataset, Presentation, Thesis, or Other" unless I misunderstand. This issue may be a candidate for adding a user story (as discussed in retrospective this afternoon) because I'm getting a little lost on who wants what and why.

@augustfly
Copy link

Those are the object types, Phil. The 6 relations are IsSupplementTo, IsCitedBy, IsNewVersionOf/IsPreviousVersionOf and the IsPartOf/HasPart pair.

I think narrowing terms/relations is wise; it has always felt like the DataCite list was made to appear to cover edge cases, which leads to unnecessary confusion. I don't see much overlap in those 6 relations though someone who hasn't thought it through for a particular object might.

I think enabling the addition of multiple relations in a single UI click by a depositor isn't solving an actual problem. Allowing a 1 click addition of 3 relations between Object A and Object B would be allowing a depositor to think they are confused by the terms and giving them a quick out from their confusion. And it adds pure noise on the resulting network graph.

Instead, defining what Dataverse thinks the relationTypes it supplies mean and giving examples of how to use them would be a better solution to the problem of depositor confusion.

@jggautier
Copy link
Contributor

jggautier commented Jan 16, 2018

Thanks @augustfly. Looks like the Scholix metadata schema (page 8-9) also limits the relationship types (IsReferencedBy, References, IsSupplementTo, IsSupplementedBy) and adds a catch-all (IsRelatedTo).

I think enabling the addition of multiple relations in a single UI click by a depositor isn't solving an actual problem. Allowing a 1 click addition of 3 relations between Object A and Object B would be allowing a depositor to think they are confused by the terms and giving them a quick out from their confusion. And it adds pure noise on the resulting network graph.

Ah, I didn't mean to suggest enabling the addition of multiple relations in a single UI click (writing "considering the overlap in terms" does make it seem like this is what I was getting at). I meant that as a depositor, I might want to say that dataset A is related to article B in two or more distinct ways (ex. the dataset supports findings in my article, and I also cite the dataset in my article). This is what the Illinois Data Bank folks decided to do:

pg. 214 (PDF)

Use two instances of <RelatedIdentifier> for the same identifier when the data set whose conclusions support the paper is also formally cited in the paper.

Is including these two, or multiple relationship types, in the metadata important enough for these knowledge graphs? Would it just be noise to include both isSupplementTo and isCitedBy?

I agree that defining what Dataverse thinks the relationTypes it supplies mean and giving depositors examples of how to use them is a great way to go about it.

Lastly, I've been working on mapping Dataverse metadata to DataCite 4.1 so that OAI-PMH harvesting can be done using DataCite (#4318). If we want relatedPublication metadata to be included in the DataCite xml, the DataCite schema requires relationType. Since this isn't something Dataverse lets depositors specify, I'm wondering if, until depositors are able to choose required relationType(s) in the UI, it's okay to make one relationType default. @juancorr, is this what https://edatos.consorciomadrono.es/ does, using isCitedBy for every related publication?

If it's safe to choose a default now for related publications that weren't assigned relationship types when published, is it safe to simply make that relationship type the default choice of many choices (e.g. in a pulldown list in an adjusted dataset create/edit form)?

@RightInTwo
Copy link
Contributor

RightInTwo commented Sep 26, 2018

I would love to see this, as our primary publication target is DataCite-based and mints new DOIs for new versions of publications. Without the relationType IsNewVersionOf/IsPreviousVersionOf, I can't think of a way to accurately describe this.

Also, being fully DataCite-compatible would be great, as discussed in #4318.

All the best
Jonas

@jggautier
Copy link
Contributor

Thanks @RightInTwo! When you write "publications", I'm assuming you mean published datasets. (Let me know if that's inaccurate :) So it will be helpful if depositors are able to say that the dataset they're depositing IsNewVersionOf/IsPreviousVersionOf another dataset. This also makes me realize that not all of the relationships, however many we choose, will be appropriate options for describing the relationship between two research objects. For example, there's no reason to give dataset depositors the option of saying that the dataset they're depositing IsNewVersionOf an article they've published.

This work is being considered as part of Dataverse's current grant-funded commitment to publish data use and citation metrics following the Code of Practice for Research Data Usage standard (Make Data Count), since the Event Data service that's aggregating the metadata in order to generate citation counts needs the "relatedIdentifier" metadata, which Dataverse needs to send to DataCite when registering DOIs (or updating the metadata of already registered DOIs, #5144).

Could we update the dataset metadata form to let depositors say how the research object is related to the dataset they're depositing, maybe using a new metadata field? That would mean a fifth field, maybe a dropdown menu, in the "Related Publication" compound field, and a second field for "Related Datasets" (which would become a compound field).

Of the six relations that the Illinois Data Bank settled on, I'm proposing using:

For "Related Publication":

  • IsSupplementTo
  • IsCitedBy

For "Related Dataset":

  • IsSupplementTo
  • IsCitedBy
  • IsNewVersionOf
  • IsPreviousVersionOf
  • IsPartOf
  • HasPart

(I'm not recommending that Dataverse display those phrases in the UI. Zenodo tries to clarify with longer phrases:

screen shot 2018-11-30 at 6 26 52 pm

@pameyer
Copy link
Contributor

pameyer commented Dec 2, 2018

@jggautier Any thoughts on IsSourceOf for "Related Dataset"?

I'm a little confused about how a dataset would cite another dataset, but that's likely because I haven't looked at the spec for a while.

@RightInTwo
Copy link
Contributor

@jggautier yes, i am talking about published datasets

In regards to the relation types, Datacite defines a whole bunch of them:

  • IsCitedBy
  • Cites
  • IsSupplementTo
  • IsSupplementedBy
  • IsContinuedBy
  • Continues
  • IsDescribedBy
  • Describes
  • HasMetadata
  • IsMetadataFor
  • HasVersion
  • IsVersionOf
  • IsNewVersionOf
  • IsPreviousVersionOf
  • IsPartOf
  • HasPart
  • IsReferencedBy
  • References
  • IsDocumentedBy
  • Documents
  • IsCompiledBy
  • Compiles
  • IsVariantFormOf
  • IsOriginalFormOf
  • IsIdenticalTo
  • IsReviewedBy
  • Reviews
  • IsDerivedFrom
  • IsSourceOf
  • IsRequiredBy
  • Requires

They are described in the appendix of the version 4.1 specification.

@jggautier
Copy link
Contributor

Thanks @RightInTwo. I mentioned 25 in an earlier comment; didn't realize so many more had been added since version 3.1!

I hadn't considered the whole list yet, just wanted to get the conversation started (but I agree isCitedBy doesn't make sense to me right now). Maybe we could get more feedback from depositors and curators of different types of datasets.

IsSourceOf makes me think of one dataset being derived from another. "Here's a dataset that was used to create this one, but it's not a new version or a part of that first dataset."

@jggautier jggautier changed the title Update Citation Metadata with "Relation Type" to be DataCite 3.1 compliant Update Citation Metadata with "Relation Type" to be DataCite compliant Dec 3, 2018
@pameyer
Copy link
Contributor

pameyer commented Dec 3, 2018

@jggautier That's the exact usage of IsSourceOf that I was thinking of - it seems to me like a better framework than mixing inputs and outputs in the same dataset.

@RightInTwo
Copy link
Contributor

RightInTwo commented Dec 12, 2018

@jggautier what would happen in the case that i import metadata from another source that actually uses IsSourceOf (for whatever reason) and it's not implemented in dataverse?

@jggautier
Copy link
Contributor

jggautier commented Dec 12, 2018

One idea would be to map the types we don't use to the types we do. (Would probably be helpful to consider the types different repositories use now.) So if someone imports metadata with a relationtype of isReferencedBy, Dataverse changes that type to isCitedBy. This conversion would be published someplace so people importing metadata know.

And later if changes are asked for by people importing metadata with the types that Dataverse converts to another type, we can reconsider adding/changing types.

@RightInTwo
Copy link
Contributor

RightInTwo commented Dec 12, 2018

@jggautier so the import would fail in that case? as long as it does not just ignore that field or fails silently, that sounds good

@eugene-barsky
Copy link

Do we have any updates on this issue? Here is what Datacite recommends us to do - https://support.datacite.org/docs/contributing-citations-and-references

@jggautier
Copy link
Contributor

jggautier commented Sep 27, 2022

Hey @eugene-barsky. Thanks for pointing that page out!

There's more recent discussion about this issue in the broader GitHub issue at #8108, where @KellyStathis from DataCite offered advice about using relationTypes and pointed to https://support.datacite.org/docs/connecting-to-works, which now eventually leads to https://support.datacite.org/docs/contributing-citations-and-references

Folks at Harvard that are part of the NIH's Generalist Repository Ecosystem Initiative (GREI) need this issue resolved, too. @KellyStathis, in email discussions in July with Harvard members of the GREI group, also pointed out https://support.datacite.org/docs/contributing-citations-and-references and made some general recommendations. And in meetings coming up this month, the GREI groups will be joined by one or more folks from DataCite who will be able to help with metadata questions like this one.

But I think no decision has been made and implemented partly because even the incredibly helpful guide at https://support.datacite.org/docs/contributing-citations-and-references leaves enough room for debate and the community hasn't found the time to build consensus.

I've been imagining that as the GREI work continues, the Harvard folks in the working groups, including me, can learn from more of the Dataverse community (something @qqmyers also recommended in a related pull request), and what we learn can inform the GREI work (and perhaps DataCite's recommendations).

@amberleahey and @philippconzett asked in a Google Groups thread last week if a Dataverse Metadata WG/IG meeting could be scheduled to discuss this. I'll also ping @mreekie, who is catching up on this and other related issues.

@eugene-barsky
Copy link

eugene-barsky commented Sep 27, 2022 via email

@philippconzett
Copy link
Contributor

@jggautier Should we schedule a Metadata WG/IG meeting on Thursday October 6? I think we could discuss several related issues:

  • probably other issues

I'll be out of office in a couple of hours and until Tuesday, but if you and @qqmyers and others are available, I could join you after 1500 CEST. Thanks!

@marcomarsella
Copy link

marcomarsella commented Sep 29, 2022 via email

@poikilotherm
Copy link
Contributor

I'd second that idea of having a meeting!

@jggautier
Copy link
Contributor

Hey @philippconzett. Sorry for the delay in replying. For a few reasons I've been hesitant to agree that a meeting should be scheduled, but I'd be happy to help promote one for this Thursday, Oct 6. 2pm UTC (time converter). I made a Google Doc at https://docs.google.com/document/d/1tNnvVh8jYY1g53BEwpJmMmm9w6Vgy_Q7RrmFjGnYOyA for note taking.

Looks like we need a new Slack link. The one in the calendar invite doesn't work. Was probably Danny Brooke's.

@eugene-barsky
Copy link

eugene-barsky commented Oct 3, 2022 via email

@amberleahey
Copy link

Count me in, let's build some consensus on DataCite mappings and put it to a community vote!

@marcomarsella
Copy link

Hi! I might be unable to attend due to a change in my agenda for the 6th. However, my use case is, I believe, clearly described in #2778 (comment)
Please also consider my recommendation of the "References" operator.

@philippconzett
Copy link
Contributor

Thanks, @jggautier! I have posted a message in #ig-metadata on Slack. Are you going to create a Zoom link, or do you want me to do that?

@jggautier
Copy link
Contributor

Thanks for posting in Slack. I created a Zoom link and updated the event on the Dataverse Community Calendar.

@marcomarsella
Copy link

marcomarsella commented Oct 4, 2022 via email

@jggautier
Copy link
Contributor

Join Zoom meeting
https://harvard.zoom.us/j/9368020057
Join by telephone (use any number to dial in):

  • +1 646 931 3860
  • +1 929 436 2866
  • +1 301 715 8592
  • +1 309 205 3325
  • +1 312 626 6799
  • +1 564 217 2000
  • +1 669 444 9171
  • +1 669 900 6833
  • +1 719 359 4580
  • +1 253 215 8782
  • +1 346 248 7799
  • +1 386 347 5053

International numbers available: https://harvard.zoom.us/u/a3oPwvAlb
One tap mobile: +16469313860,,9368020057# US
Join by SIP conference room system
Meeting ID: 936 802 0057
[email protected]

@doigl
Copy link
Contributor

doigl commented Oct 11, 2022 via email

@jggautier
Copy link
Contributor

jggautier commented Oct 11, 2022

Hi @doigl. I'm guessing this was a very delayed message? And I believe you were able to make it to the meeting on Oct 6 after all, right? I thought I saw you there :)

@KellyStathis
Copy link

Apologies for the delay weighing in here; I had missed a few notifications and then was out of office last week!

I've reviewed the notes from the metadata interest group call and support the idea of having a user-selectable relation type, rather than assuming all "Related Publications" are citations of the dataset (for example). A couple thoughts on this:

  • I would recommend only including a subset of relationTypes so it isn't overwhelming for users. An example of this is FRDR in Canada, which has these options (under "Relation types)". (Note these definitions/examples haven't been reviewed by DataCite, but they should be compatible.)
  • For the UI, the directionality can sometimes be confusing - one approach is to have a sentence, e.g. "This dataset is cited by _____".

On DataCite's side: we're looking at producing better guidance and examples in the near term; this will involve analyzing existing usage and community consultation, and we definitely want to hear from the Dataverse community. Will keep you updated as that work gets underway!

@pdurbin
Copy link
Member

pdurbin commented Mar 31, 2023

I just linked to this issue as "under development" in a agenda/notes doc from today's GREI Metrics Sub-Committee Meeting:

Screen Shot 2023-03-31 at 11 07 17 AM

@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@cmbz cmbz closed this as completed Aug 20, 2024
@pdurbin
Copy link
Member

pdurbin commented Sep 17, 2024

The original idea in this issue... Add "Relation Type" to related publication metadata fields to send DataCite related publication metadata is being addressed by this PR:

There's lots of other chatter in this issue that may not be addressed by it. Please feel free to open new issues for the rest! 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata Type: Suggestion an idea User Role: Curator Curates and reviews datasets, manages permissions
Projects
Status: Interested
Development

No branches or pull requests