Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registering missing institution_id entries for obs4MIPs #41

Open
durack1 opened this issue Feb 15, 2024 · 8 comments
Open

Registering missing institution_id entries for obs4MIPs #41

durack1 opened this issue Feb 15, 2024 · 8 comments

Comments

@durack1
Copy link
Contributor

durack1 commented Feb 15, 2024

Just linking across repos following PCMDI/input4MIPs_CVs#8.

We need to register DLR-BIRA, ESPRI-IPSL, GloH2O, INCOIS-NIO-IPSL, NOAA-ESRL-PSD, UCI-CHRS, and UCSD-SIO.

This is a matched issue with PCMDI/obs4MIPs_CVs#1

@wolfiex
Copy link
Collaborator

wolfiex commented Feb 25, 2024

We need to separate the consortiums from the institutions. Consortiums will not have an ROR number.

@taylor13
Copy link
Collaborator

Having separate tables for consortiums and institutions may make some QC checks more complicated. If, for example, we impose a directory structure that includes "institution", but a consortium is responsible, we want the consortium to appear instead of the institution as part of the directory structure. If we maintain separate CVs for consortiums and institutions, software checking that all the elements of a directory structure are included in a CV would have to check both the institution CV and the consortium CV.
Could we include the consortia in the institutions CV with the following changes?

  • the ROR number for a consortium would be set to "NONE" or "NA" or 0, or some other special string?
  • two additional attributes would be added to the "institution" registry (CV):
  1. "consortium_members":[]. (We might require this only if "institution" were really a consortium, or if the "institution" was only a simple institution, it could be set to "NONE")
  2. "in_consortium":[] (We might require this only for institutions that are in one or more consortiums, or it could be set to "NONE" if not in any.

@durack1 durack1 added this to the v 6.5.x.2 milestone Mar 5, 2024
@wolfiex
Copy link
Collaborator

wolfiex commented Apr 21, 2024

These can now be added via an issue template as of #49

@wolfiex
Copy link
Collaborator

wolfiex commented Apr 21, 2024

Just as an idea for discussion:

Two Institution consortiums are actually a partnership - of which we have many. Does it make sense to allow multiple institutions to be submitted per entry instead?

What are the funding implications of doing so?

Similarly would we want to differentiate datasets submitted before and after a new entity joins a consortium?

@durack1
Copy link
Contributor Author

durack1 commented Apr 22, 2024

@wolfiex @taylor13 This is starting to get complicated; I suggest we intentionally try to simplify things as much as we can. I had been thinking that we need an "institution" (bricks and mortar, with a postal address) that would then be eligible for an ROR, and rather than have consortium "institutions," we'd catch these consortiums as part of the source_id - my thinking with PCMDI/input4MIPs_CVs#9, UExeter is the lead (Thomas' host) institution, but the dataset is a team/consortium effort which is identified by a source_id that may have multiple institutions listed, with the first entry the dataset institution_id

@taylor13
Copy link
Collaborator

Is this what you are proposing?

  1. Each source_id appearing in the CV would include a sub-entry ("institution_id") listing all the institution(s) that might be responsible for datasets produced by the source.
  2. In each file, a global attribute ("institution_id") would record only the subset of CV institutions that were actually responsible for a given dataset. Only one institution would be listed except in the case of a consortium or partnership.
  3. The DRS would use the first institution listed in the institution_id global attribute in creating a dataset I.D. and in creating directory structures.

I don't see any problems with this, but it will decrease visibility of the consortium name, which will appear nowhere. Will EC-Earth be o.k. with this?

@durack1
Copy link
Contributor Author

durack1 commented Apr 22, 2024

@taylor13 I think I need to calibrate with @wolfiex and better understand how information is partitioned between the MIP_consortiums.json and MIP_institutions.json files - airplane wifi is too spotty.

For the CMIP6 EC-Earth3* examples, everyone of these had a listed institution_id = EC-Earth-Consortium which includes ~30 institutions identified by a name/acronym and country with a mailing address of SMHI, one of the 30 listed centres (see CMIP6_institution_id.html). This is a particularly good example of why my simplified system would not work well - whereas it could work in the case of the volcanic forcing.. It might be useful to defer this issue to a discussion, as there is much to calibrate on it seems, and working through all existing examples is the only way to definitively come up with a path that maps across all existing (and hopefully future) configs

@wolfiex
Copy link
Collaborator

wolfiex commented Apr 22, 2024

The way I understand it

  • A consortium is a collection of individual institutions presenting as one. An institution is just that.

  • A consortium is not a physical entity but behaves like one.

The work, submission, data, modelling is still done by a single person at an institution but commissioned or funded as part of something greater. It is therefore recorded from a consortium.

The same way we can have ESPRI-IPSL INCO?-NEO-IPSL and direct submissions by IPSL

(To be corrected later)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants