Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create metadata blocks for CAFE's collection of climate and geospatial data #232

Closed
jggautier opened this issue Nov 1, 2023 · 62 comments
Closed
Assignees
Labels
Metadata Block NIH CAFE Issues associated with the NIH CAFE project Size: 10 A percentage of a sprint.

Comments

@jggautier
Copy link
Collaborator

jggautier commented Nov 1, 2023

This GitHub issue is being used to track progress of the creation of a metadata block or metadata blocks I'm helping design for a Dataverse collection that the BUSPH-HSPH Climate Change and Health Research Coordinating Center (CAFE) will be managing on Harvard Dataverse. Their unpublished collection is at https://dataverse.harvard.edu/dataverse/cafe.

In this repo at https://github.com/IQSS/dataverse.harvard.edu/tree/master/metadatablocks, I've added the .tsv and .properties files that define the metadata fields, and I'll continue updating those files as the CAFE folks review and improve the metadata fields.

This screenshot shows the metadata block we're planning to add, as of 2023-11-07, so that depositors can describe the geospatial data:
cafedatalocation

This screenshot shows the metadata block we're planning to add, as of 2023-11-07, so that depositors can describe the source datasets of the dataset being deposited:
cafedatasources

@jggautier jggautier self-assigned this Nov 1, 2023
jggautier added a commit that referenced this issue Nov 1, 2023
@jggautier jggautier changed the title Create metadatablock for CAFE's collection of climate and geospatial data Create metadata block for CAFE's collection of climate and geospatial data Nov 7, 2023
@jggautier
Copy link
Collaborator Author

jggautier commented Nov 7, 2023

I'd also like to use this GitHub issue to record the concerns/risks with this effort, similar to how we noted in other GitHub issues that metadata fields in metadata blocks created for other collections in HDV serve purposes that overlap with fields that are already available, such as fields in the Citation metadata block, and that facilitate describing data that others in the community have expressed interest in, like the metadata block for 3D Data discussed in #144.

Metadata added in custom metadata block won't be in most metadata exports
I spoke with the CAFE collection administrators about how the metadata added in these new metadata blocks won't be included in most metadata exports and won't be used to make the datasets more discoverable in other systems, such as search engines. This is the case with all "custom" metadata blocks we've added for collections in Harvard Dataverse.

Showing or hiding fields based on what's entered in other fields so that depositors see only relevant fields
We talked about how Dataverse has no way to show or hide fields based on what's entered in other fields, which is what they wanted to do for the first field in both metadata blocks so that depositors see only relevant fields.

Those first two fields are dropdown menus where the options are "Yes" and "No". So if a depositor chooses "No" for the "Geospatial File Type" field, depositors shouldn't enter metadata in the other fields that describe a geospatial file, since there isn't a geospatial file. Since Dataverse will always show all of the fields, the CAFE folks plan to address this with instructions in a dataset template and/or training.

Letting depositors type in and enter a term in a field that uses a vocabulary
We talked about how if depositors want to enter their own term for fields that include a vocabulary, such as the "Spatial File Type" field, they'll need to choose the dropdown menu's "Other" option, and type their term in the "Other Spatial File Type" field, which is always shown whether or not the depositor chooses "Other" in the first field. We've used this pattern for a field in the Life Sciences metadata block and in other custom metadata blocks in HDV.

The external controlled vocabulary mechanism handles this in a more common and arguably better way by using a UI component that let's depositors choose a term from a vocabulary and also enter their own terms in the same field. But this mechanism works only for vocabularies hosted externally and not for vocabularies that are defined in metadata block TSV files.

Custom metadata block about data location versus geospatial metadata block that ships with Dataverse
The collection's administrators wanted to add fields to the geospatial metadata block that ships with Dataverse. Because it would take more time than they have to do that, we agreed to create this new metadata block for the CAFE collection, instead. They're interested in joining the Dataverse community's discussions about improving how depositors describe geospatial data and I'll need to connect them with @pdurbin and others who've worked on this.

Describing geospatial files in the dataset-level metadata
Collection administrators expect that each deposit will include either no geospatial file or only one geospatial file, which these metadata fields will describe. @cmbz has included this use case with others being collected to support the need for improving Dataverse's ability to record file-level metadata.

Overlap among fields in the "Metadata Block About Data Sources" and fields in Citation metadata block
We talked about how the fields in the "Metadata Block About Data Sources" overlap with the "Related Dataset" and "Data Source" fields. They planned to hide those "Related Dataset" and "Data Source" fields so that depositors aren't confused, and because they expect depositors to need to use only the fields in the custom metadata block to describe a source dataset that they used when producing their deposit.

I also mentioned that once Dataverse can send metadata about related resources to DataCite (IQSS/dataverse#5277), we'll need to think about if and how to include the related datasets described in their custom metadata block.

Automatic layout of child fields might make it hard for depositors to fill fields the way we expect
We talked about how the automatic layout of the child fields might confuse depositors. For example, depositors need to understand the relationship between the "Type" and "Other Type" fields in the "Metadata Block About Data Sources", since depositors are asked to use the "Other Type" field to add a term that isn't in the list of terms in the dropdown of the "Type" field. But in the UI, there's no visual indication that these fields rely on each other, other than the names of the fields.

281077681-502aa68d-65ea-4215-9053-3d77c00084a0

We've seen and talked about how this design also confuses depositors who use other compound fields like the Related Publication fields in the Citation metadata block. There's related discussion in IQSS/dataverse#5277.

Metadata in "Metadata Block About Data Sources" is hard to read when viewing metadata on dataset page
We talked about how when the metadata is displayed on the dataset page, it's hard to read. This is discussed more in IQSS/dataverse#6589.

@cmbz cmbz moved this to NIH CAFE Project in IQSS Dataverse Project Nov 13, 2023
@cmbz cmbz added this to the 6.1 milestone Nov 13, 2023
@cmbz
Copy link
Collaborator

cmbz commented Nov 13, 2023

2023/11/13

  • After discussion during the prioritization meeting, I added to Global Backlog in the new NIH CAFE Project column.
  • Will be completed during 6.1 timeframe (note, this is a Harvard Dataverse installation improvement)
  • Possibly assign to @stevenwinship once the issue has been sized
  • Moved into Needs Sizing
  • Next step: Sprint Ready for final 6.1 sprint.

@cmbz cmbz added NIH CAFE Issues associated with the NIH CAFE project Metadata Block labels Nov 13, 2023
@cmbz cmbz moved this from NIH CAFE Project to SPRINT- NEEDS SIZING in IQSS Dataverse Project Nov 14, 2023
@cmbz cmbz added the Size: 3 A percentage of a sprint. label Nov 20, 2023
@cmbz cmbz removed this from the 6.1 milestone Nov 20, 2023
@cmbz cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Nov 27, 2023
@scolapasta scolapasta moved this from SPRINT READY to Clear of the Backlog in IQSS Dataverse Project Dec 6, 2023
@landreev landreev self-assigned this Dec 13, 2023
@landreev
Copy link
Collaborator

@jggautier Just to confirm - am I installing both customCAFEDataLocation.tsv and customCAFEDataSources.tsv in prod.?

@jggautier
Copy link
Collaborator Author

Ah yes. That CAFE collection's managers would like both of those metadata blocks available for the collection. I'll update this issue's title.

@jggautier jggautier changed the title Create metadata block for CAFE's collection of climate and geospatial data Create metadata blocks for CAFE's collection of climate and geospatial data Dec 13, 2023
@landreev
Copy link
Collaborator

landreev commented Dec 14, 2023

I looked into this briefly, and I'm wondering if it would be better for new blocks to go through more of a qa process, like we do with everything else, before deploying them in prod.

My biggest questions/concern were with the GeoSpatialResolution fields in these blocks, since we just had to spend so much effort addressing issues with similar fields in the Geospatial block.

There may be some similar issues with validation for the values in this block. Namely, the values are defined as floats. So it is impossible to enter anything that does not parse as a decimal fractional. This would be the right behavior when "Decimal degrees" is selected in the "Unit" pulldown. But you can also select “Degrees-minutes-seconds” in the same pulldown - but it is then impossible to enter such a value:

Screen Shot 2023-12-14 at 12 06 43 PM

(there are other notations for formatting "degrees-minutes-seconds" values of course, but none of them will parse as a valid decimal fractional).

I feel like if we want this field to support all the notations listed, the only way to achieve that would be to switch it back to text, and add custom validation methods, like we did with the Geospatial block fields.

@landreev
Copy link
Collaborator

I was also told that there were some technical issues with bringing up a test instance for the researchers involved to experiment with. But I feel like that part must be something we can figure out.

@jggautier
Copy link
Collaborator Author

@sbarbosadataverse, it was agreed to continue testing this and the other metadata block after they were added to Harvard Dataverse.

But can we bring up this concern, about validation, with the collection's manager Keith, and ask if user testing can be done before these metadatablocks are added?

@landreev
Copy link
Collaborator

We had a quick chat about this on slack. It sounded like I should clarify what I said above:

like we do with everything else, before deploying them in prod.

By "like we do with everything else" I didn't meant literally the same process as how we QA dev. issues - deploying it on dataverse-internal, have the same QA person test it, etc. etc. I meant more like the same idea of testing and confirming that everything works properly before trying it in prod. It sounds like this should be a somewhat different process for custom blocks - focused more on letting the researchers who requested the block do the testing and confirming that everything works the way they like.

@landreev
Copy link
Collaborator

@sbarbosadataverse, it was agreed to continue testing this and the other metadata block after they were added to Harvard Dataverse.

But can we bring up this concern, about validation, with the collection's manager Keith, and ask if user testing can be done before these metadatablocks are added?

Basically, rather than using the production for testing these blocks, let's let the collection admin(s) experiment with them on a test instance.

@cmbz cmbz moved this from Clear of the Backlog to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Dec 14, 2023
@jggautier
Copy link
Collaborator Author

Thanks @landreev for applying the changes to the ec2 test instance! I see the changes and everything's working as expected.

I'll let the collection admin know that:

  • We'll remove and re-add the metadata blocks on their collection in Harvard Dataverse
  • For those two datasets that had already used the new metadata fields, we'll have to remove the fields, they'll have to re-enter the metadata once the metadata blocks are re-added, save their metadata changes, and let us know so that one of us can use a super-user account to publish the changes without creating new versions.

Does that all sound good? I'll wait to hear back from you before I email the collection admin.

@landreev
Copy link
Collaborator

That sounds perfect.
I'll wait for you to confirm that they are ok w/ all of this, before proceeding to delete any fields.

@jggautier
Copy link
Collaborator Author

Okay, they wrote that they're okay with it

@landreev
Copy link
Collaborator

OK, I'll look into the 2 datasets with the populated fields next. Will report.

@landreev
Copy link
Collaborator

landreev commented Jan 19, 2024

For the record - nope, you cannot remove these CAFE*-blocks fields in the UI. ☹️ On account of some of them being required.
We will need to resort to some hackery to resolve this. I need to be super careful about it, so it may take some appreciable time.

@landreev
Copy link
Collaborator

Looking into it now.

@landreev
Copy link
Collaborator

landreev commented Jan 24, 2024

[edit: n/m!]

@jggautier Could you please confirm that these are the versions of the blocks that are currently installed in production? (Jan. 3 commits) -

https://github.com/IQSS/dataverse.harvard.edu/blob/f4a79a733a9d5b816080cf463914e8743a761fff/metadatablocks/customCAFEDataLocation.tsv
and
https://github.com/IQSS/dataverse.harvard.edu/blob/f4a79a733a9d5b816080cf463914e8743a761fff/metadatablocks/customCAFEDataSources.tsv

Not super crucial; I just want to replicate the prod. setup on my own dev. box 1:1 to test the delete queries.

@landreev
Copy link
Collaborator

successfully removed "Location"...
"Sources" next...

@landreev
Copy link
Collaborator

OK, all the existing field values and fields have been erased, and both custom blocks uninstalled, like they were never there.
I will install the new versions later tonight (so that I don't have to restart Solr during the day).
If you have a sec., please take a look at the 2 published datasets in question, just to confirm that there's nothing visibly wrong with them after I had to mess with the metadata in the database. Just in case, I'm fairly positive they should be ok. 🤞

@landreev
Copy link
Collaborator

(no, I didn't get to installing the blocks last night, but will do shortly)

@jggautier
Copy link
Collaborator Author

Thanks. Sorry I didn't get to help look at those Jan. 3 commits yesterday. Got caught up in other stuff.

I checked those two published datasets. They're editable and I don't see any traces of the new blocks' fields in the forms.

I do see the metadata in the JSON exports, like https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.7910/DVN/Y1WNU7. Just pointing that out just in case.

@landreev
Copy link
Collaborator

Correct, I didn't bother re-exporting the 2 datasets.
They will be automatically re-exported when they are re-published.

@landreev
Copy link
Collaborator

The 2 blocks have been installed (again). Please review/double-check that these are the correct versions.

@jggautier
Copy link
Collaborator Author

I just reviewed them and they're the correct versions. Thanks!

@landreev
Copy link
Collaborator

@jggautier Can we close it, or do you want to keep it open until they enter all the metadata they need?

@jggautier
Copy link
Collaborator Author

Ah, yes we can close it. I'll do that now. Wasn't sure if there was anything else we needed to do for re-adding the metadata blocks.

We can track the remaining tasks, mostly about those two published datasets you edited, in our email thread with the collection admin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metadata Block NIH CAFE Issues associated with the NIH CAFE project Size: 10 A percentage of a sprint.
Projects
None yet
Development

No branches or pull requests

4 participants