Create metadata blocks for CAFE's collection of climate and geospatial data #232

jggautier · 2023-11-01T16:00:05Z

This GitHub issue is being used to track progress of the creation of a metadata block or metadata blocks I'm helping design for a Dataverse collection that the BUSPH-HSPH Climate Change and Health Research Coordinating Center (CAFE) will be managing on Harvard Dataverse. Their unpublished collection is at https://dataverse.harvard.edu/dataverse/cafe.

In this repo at https://github.com/IQSS/dataverse.harvard.edu/tree/master/metadatablocks, I've added the .tsv and .properties files that define the metadata fields, and I'll continue updating those files as the CAFE folks review and improve the metadata fields.

This screenshot shows the metadata block we're planning to add, as of 2023-11-07, so that depositors can describe the geospatial data:

This screenshot shows the metadata block we're planning to add, as of 2023-11-07, so that depositors can describe the source datasets of the dataset being deposited:

See #232 for more info

jggautier · 2023-11-07T15:46:11Z

I'd also like to use this GitHub issue to record the concerns/risks with this effort, similar to how we noted in other GitHub issues that metadata fields in metadata blocks created for other collections in HDV serve purposes that overlap with fields that are already available, such as fields in the Citation metadata block, and that facilitate describing data that others in the community have expressed interest in, like the metadata block for 3D Data discussed in #144.

Metadata added in custom metadata block won't be in most metadata exports
I spoke with the CAFE collection administrators about how the metadata added in these new metadata blocks won't be included in most metadata exports and won't be used to make the datasets more discoverable in other systems, such as search engines. This is the case with all "custom" metadata blocks we've added for collections in Harvard Dataverse.

Showing or hiding fields based on what's entered in other fields so that depositors see only relevant fields
We talked about how Dataverse has no way to show or hide fields based on what's entered in other fields, which is what they wanted to do for the first field in both metadata blocks so that depositors see only relevant fields.

Those first two fields are dropdown menus where the options are "Yes" and "No". So if a depositor chooses "No" for the "Geospatial File Type" field, depositors shouldn't enter metadata in the other fields that describe a geospatial file, since there isn't a geospatial file. Since Dataverse will always show all of the fields, the CAFE folks plan to address this with instructions in a dataset template and/or training.

Letting depositors type in and enter a term in a field that uses a vocabulary
We talked about how if depositors want to enter their own term for fields that include a vocabulary, such as the "Spatial File Type" field, they'll need to choose the dropdown menu's "Other" option, and type their term in the "Other Spatial File Type" field, which is always shown whether or not the depositor chooses "Other" in the first field. We've used this pattern for a field in the Life Sciences metadata block and in other custom metadata blocks in HDV.

The external controlled vocabulary mechanism handles this in a more common and arguably better way by using a UI component that let's depositors choose a term from a vocabulary and also enter their own terms in the same field. But this mechanism works only for vocabularies hosted externally and not for vocabularies that are defined in metadata block TSV files.

Custom metadata block about data location versus geospatial metadata block that ships with Dataverse
The collection's administrators wanted to add fields to the geospatial metadata block that ships with Dataverse. Because it would take more time than they have to do that, we agreed to create this new metadata block for the CAFE collection, instead. They're interested in joining the Dataverse community's discussions about improving how depositors describe geospatial data and I'll need to connect them with @pdurbin and others who've worked on this.

Describing geospatial files in the dataset-level metadata
Collection administrators expect that each deposit will include either no geospatial file or only one geospatial file, which these metadata fields will describe. @cmbz has included this use case with others being collected to support the need for improving Dataverse's ability to record file-level metadata.

Overlap among fields in the "Metadata Block About Data Sources" and fields in Citation metadata block
We talked about how the fields in the "Metadata Block About Data Sources" overlap with the "Related Dataset" and "Data Source" fields. They planned to hide those "Related Dataset" and "Data Source" fields so that depositors aren't confused, and because they expect depositors to need to use only the fields in the custom metadata block to describe a source dataset that they used when producing their deposit.

I also mentioned that once Dataverse can send metadata about related resources to DataCite (IQSS/dataverse#5277), we'll need to think about if and how to include the related datasets described in their custom metadata block.

Automatic layout of child fields might make it hard for depositors to fill fields the way we expect
We talked about how the automatic layout of the child fields might confuse depositors. For example, depositors need to understand the relationship between the "Type" and "Other Type" fields in the "Metadata Block About Data Sources", since depositors are asked to use the "Other Type" field to add a term that isn't in the list of terms in the dropdown of the "Type" field. But in the UI, there's no visual indication that these fields rely on each other, other than the names of the fields.

We've seen and talked about how this design also confuses depositors who use other compound fields like the Related Publication fields in the Citation metadata block. There's related discussion in IQSS/dataverse#5277.

Metadata in "Metadata Block About Data Sources" is hard to read when viewing metadata on dataset page
We talked about how when the metadata is displayed on the dataset page, it's hard to read. This is discussed more in IQSS/dataverse#6589.

cmbz · 2023-11-13T18:51:54Z

2023/11/13

After discussion during the prioritization meeting, I added to Global Backlog in the new NIH CAFE Project column.
Will be completed during 6.1 timeframe (note, this is a Harvard Dataverse installation improvement)
Possibly assign to @stevenwinship once the issue has been sized
Moved into Needs Sizing
Next step: Sprint Ready for final 6.1 sprint.

landreev · 2023-12-13T23:07:15Z

@jggautier Just to confirm - am I installing both customCAFEDataLocation.tsv and customCAFEDataSources.tsv in prod.?

jggautier · 2023-12-13T23:13:04Z

Ah yes. That CAFE collection's managers would like both of those metadata blocks available for the collection. I'll update this issue's title.

landreev · 2023-12-14T17:15:21Z

I looked into this briefly, and I'm wondering if it would be better for new blocks to go through more of a qa process, like we do with everything else, before deploying them in prod.

My biggest questions/concern were with the GeoSpatialResolution fields in these blocks, since we just had to spend so much effort addressing issues with similar fields in the Geospatial block.

There may be some similar issues with validation for the values in this block. Namely, the values are defined as floats. So it is impossible to enter anything that does not parse as a decimal fractional. This would be the right behavior when "Decimal degrees" is selected in the "Unit" pulldown. But you can also select “Degrees-minutes-seconds” in the same pulldown - but it is then impossible to enter such a value:

(there are other notations for formatting "degrees-minutes-seconds" values of course, but none of them will parse as a valid decimal fractional).

I feel like if we want this field to support all the notations listed, the only way to achieve that would be to switch it back to text, and add custom validation methods, like we did with the Geospatial block fields.

landreev · 2023-12-14T17:20:16Z

I was also told that there were some technical issues with bringing up a test instance for the researchers involved to experiment with. But I feel like that part must be something we can figure out.

jggautier · 2023-12-14T18:03:49Z

@sbarbosadataverse, it was agreed to continue testing this and the other metadata block after they were added to Harvard Dataverse.

But can we bring up this concern, about validation, with the collection's manager Keith, and ask if user testing can be done before these metadatablocks are added?

landreev · 2023-12-14T18:04:51Z

We had a quick chat about this on slack. It sounded like I should clarify what I said above:

like we do with everything else, before deploying them in prod.

By "like we do with everything else" I didn't meant literally the same process as how we QA dev. issues - deploying it on dataverse-internal, have the same QA person test it, etc. etc. I meant more like the same idea of testing and confirming that everything works properly before trying it in prod. It sounds like this should be a somewhat different process for custom blocks - focused more on letting the researchers who requested the block do the testing and confirming that everything works the way they like.

landreev · 2023-12-14T18:10:28Z

@sbarbosadataverse, it was agreed to continue testing this and the other metadata block after they were added to Harvard Dataverse.

But can we bring up this concern, about validation, with the collection's manager Keith, and ask if user testing can be done before these metadatablocks are added?

Basically, rather than using the production for testing these blocks, let's let the collection admin(s) experiment with them on a test instance.

jggautier · 2024-01-18T17:19:06Z

Thanks @landreev for applying the changes to the ec2 test instance! I see the changes and everything's working as expected.

I'll let the collection admin know that:

We'll remove and re-add the metadata blocks on their collection in Harvard Dataverse
For those two datasets that had already used the new metadata fields, we'll have to remove the fields, they'll have to re-enter the metadata once the metadata blocks are re-added, save their metadata changes, and let us know so that one of us can use a super-user account to publish the changes without creating new versions.

Does that all sound good? I'll wait to hear back from you before I email the collection admin.

landreev · 2024-01-18T19:15:04Z

That sounds perfect.
I'll wait for you to confirm that they are ok w/ all of this, before proceeding to delete any fields.

jggautier · 2024-01-18T21:05:52Z

Okay, they wrote that they're okay with it

landreev · 2024-01-18T22:33:17Z

OK, I'll look into the 2 datasets with the populated fields next. Will report.

landreev · 2024-01-19T13:26:32Z

For the record - nope, you cannot remove these CAFE*-blocks fields in the UI. ☹️ On account of some of them being required.
We will need to resort to some hackery to resolve this. I need to be super careful about it, so it may take some appreciable time.

landreev · 2024-01-24T19:15:43Z

Looking into it now.

landreev · 2024-01-24T19:29:20Z

[edit: n/m!]

@jggautier Could you please confirm that these are the versions of the blocks that are currently installed in production? (Jan. 3 commits) -

https://github.com/IQSS/dataverse.harvard.edu/blob/f4a79a733a9d5b816080cf463914e8743a761fff/metadatablocks/customCAFEDataLocation.tsv
and
https://github.com/IQSS/dataverse.harvard.edu/blob/f4a79a733a9d5b816080cf463914e8743a761fff/metadatablocks/customCAFEDataSources.tsv

Not super crucial; I just want to replicate the prod. setup on my own dev. box 1:1 to test the delete queries.

landreev · 2024-01-24T20:09:29Z

successfully removed "Location"...
"Sources" next...

landreev · 2024-01-24T20:33:47Z

OK, all the existing field values and fields have been erased, and both custom blocks uninstalled, like they were never there.
I will install the new versions later tonight (so that I don't have to restart Solr during the day).
If you have a sec., please take a look at the 2 published datasets in question, just to confirm that there's nothing visibly wrong with them after I had to mess with the metadata in the database. Just in case, I'm fairly positive they should be ok. 🤞

landreev · 2024-01-25T14:21:26Z

(no, I didn't get to installing the blocks last night, but will do shortly)

jggautier · 2024-01-25T14:53:13Z

Thanks. Sorry I didn't get to help look at those Jan. 3 commits yesterday. Got caught up in other stuff.

I checked those two published datasets. They're editable and I don't see any traces of the new blocks' fields in the forms.

I do see the metadata in the JSON exports, like https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.7910/DVN/Y1WNU7. Just pointing that out just in case.

landreev · 2024-01-25T17:21:31Z

Correct, I didn't bother re-exporting the 2 datasets.
They will be automatically re-exported when they are re-published.

landreev · 2024-01-25T20:32:22Z

The 2 blocks have been installed (again). Please review/double-check that these are the correct versions.

jggautier · 2024-01-25T21:10:31Z

I just reviewed them and they're the correct versions. Thanks!

landreev · 2024-01-26T15:46:38Z

@jggautier Can we close it, or do you want to keep it open until they enter all the metadata they need?

jggautier · 2024-01-26T19:01:15Z

Ah, yes we can close it. I'll do that now. Wasn't sure if there was anything else we needed to do for re-adding the metadata blocks.

We can track the remaining tasks, mostly about those two published datasets you edited, in our email thread with the collection admin.

jggautier self-assigned this Nov 1, 2023

jggautier added a commit that referenced this issue Nov 1, 2023

Adding files for CAFE's custom metadata blocks

f265d75

See #232 for more info

jggautier changed the title ~~Create metadatablock for CAFE's collection of climate and geospatial data~~ Create metadata block for CAFE's collection of climate and geospatial data Nov 7, 2023

cmbz added this to IQSS Dataverse Project Nov 13, 2023

cmbz moved this to NIH CAFE Project in IQSS Dataverse Project Nov 13, 2023

cmbz added this to the 6.1 milestone Nov 13, 2023

cmbz added NIH CAFE Issues associated with the NIH CAFE project Metadata Block labels Nov 13, 2023

cmbz moved this from NIH CAFE Project to SPRINT- NEEDS SIZING in IQSS Dataverse Project Nov 14, 2023

cmbz added the Size: 3 A percentage of a sprint. label Nov 20, 2023

cmbz removed this from the 6.1 milestone Nov 20, 2023

cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Nov 27, 2023

scolapasta unassigned jggautier Nov 28, 2023

scolapasta moved this from SPRINT READY to Clear of the Backlog in IQSS Dataverse Project Dec 6, 2023

landreev self-assigned this Dec 13, 2023

jggautier changed the title ~~Create metadata block for CAFE's collection of climate and geospatial data~~ Create metadata blocks for CAFE's collection of climate and geospatial data Dec 13, 2023

cmbz moved this from Clear of the Backlog to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Dec 14, 2023

jggautier mentioned this issue Jan 25, 2024

Premature validation of dataset metadata (single value drop down) prevents template selection IQSS/dataverse#10119

Closed

jggautier closed this as completed Jan 26, 2024

landreev moved this from In Progress 💻 to Done 🧹 in IQSS Dataverse Project Jan 26, 2024

landreev unassigned landreev and jggautier Jan 26, 2024

landreev moved this from Done 🧹 to Merged 🚀 in IQSS Dataverse Project Jan 30, 2024

landreev self-assigned this Jan 30, 2024

scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Jan 30, 2024

This was referenced Feb 1, 2024

Epic: GREI 2 - Consistent Metadata IQSS/dataverse-pm#116

Open

GREI 2: HDV Task - Improve Dataverse Biomedical Metadata Support IQSS/dataverse-pm#174

Open

sbarbosadataverse mentioned this issue Mar 28, 2024

Y3 GREI Objective 2 - Consistent Metadata NIH-GREI/grei-pm#7

Open

cmbz removed this from IQSS Dataverse Project Jul 10, 2024

jggautier mentioned this issue Oct 15, 2024

Implement Dropdown Selector for NIH Controlled Vocabulary in Keywords #267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create metadata blocks for CAFE's collection of climate and geospatial data #232

Create metadata blocks for CAFE's collection of climate and geospatial data #232

jggautier commented Nov 1, 2023 •

edited

Loading

jggautier commented Nov 7, 2023 •

edited

Loading

cmbz commented Nov 13, 2023 •

edited

Loading

landreev commented Dec 13, 2023

jggautier commented Dec 13, 2023

landreev commented Dec 14, 2023 •

edited by jggautier

Loading

landreev commented Dec 14, 2023

jggautier commented Dec 14, 2023

landreev commented Dec 14, 2023

landreev commented Dec 14, 2023

jggautier commented Jan 18, 2024

landreev commented Jan 18, 2024

jggautier commented Jan 18, 2024

landreev commented Jan 18, 2024

landreev commented Jan 19, 2024 •

edited

Loading

landreev commented Jan 24, 2024

landreev commented Jan 24, 2024 •

edited

Loading

landreev commented Jan 24, 2024

landreev commented Jan 24, 2024

landreev commented Jan 25, 2024

jggautier commented Jan 25, 2024

landreev commented Jan 25, 2024

landreev commented Jan 25, 2024

jggautier commented Jan 25, 2024

landreev commented Jan 26, 2024

jggautier commented Jan 26, 2024

Create metadata blocks for CAFE's collection of climate and geospatial data #232

Create metadata blocks for CAFE's collection of climate and geospatial data #232

Comments

jggautier commented Nov 1, 2023 • edited Loading

jggautier commented Nov 7, 2023 • edited Loading

cmbz commented Nov 13, 2023 • edited Loading

landreev commented Dec 13, 2023

jggautier commented Dec 13, 2023

landreev commented Dec 14, 2023 • edited by jggautier Loading

landreev commented Dec 14, 2023

jggautier commented Dec 14, 2023

landreev commented Dec 14, 2023

landreev commented Dec 14, 2023

jggautier commented Jan 18, 2024

landreev commented Jan 18, 2024

jggautier commented Jan 18, 2024

landreev commented Jan 18, 2024

landreev commented Jan 19, 2024 • edited Loading

landreev commented Jan 24, 2024

landreev commented Jan 24, 2024 • edited Loading

landreev commented Jan 24, 2024

landreev commented Jan 24, 2024

landreev commented Jan 25, 2024

jggautier commented Jan 25, 2024

landreev commented Jan 25, 2024

landreev commented Jan 25, 2024

jggautier commented Jan 25, 2024

landreev commented Jan 26, 2024

jggautier commented Jan 26, 2024

jggautier commented Nov 1, 2023 •

edited

Loading

jggautier commented Nov 7, 2023 •

edited

Loading

cmbz commented Nov 13, 2023 •

edited

Loading

landreev commented Dec 14, 2023 •

edited by jggautier

Loading

landreev commented Jan 19, 2024 •

edited

Loading

landreev commented Jan 24, 2024 •

edited

Loading