New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Ab/request stac metadata #712

Open

abarciauskas-bgse wants to merge 2 commits into master from ab/request-stac-metadata

Collaborator

abarciauskas-bgse commented Apr 28, 2023

I think we should request a bit more information to help us complete the STAC metadata. At a minimum, I think we should request spatial and temporal extents, since it's not always trivial to determine what this is from the files themselves.

abarciauskas-bgse added 2 commits

April 28, 2023 04:09


          Update data_request.md

9540fa9


          Update data_request.md

f3f826a

abarciauskas-bgse requested a review from wildintellect

April 28, 2023 11:17

Collaborator

wildintellect commented Apr 28, 2023

I'm ok asking users to help with collecting the STAC metadata, but think we need to make it more "optional". Or at least indicate it can be filled in later in consultation with the Data Team. I'm concerned too many questions in the request will be a barrier to submission. Also do we trust users to come up with a good collection ID? or is that something the Data Team should control.

wildintellect reviewed

View reviewed changes

.github/ISSUE_TEMPLATE/data_request.md


		This collection will be published as a Spatio-Temporal Asset Catalog (STAC) Collection. You can read the complete STAC collection spec here: (https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md).

		If you need help with any of the fields below, please let us know.

Collaborator

wildintellect Apr 28, 2023

Please fill in any fields you know. If you don't know the answer it can be updated later in consultation with the Data Team.

wildintellect reviewed

View reviewed changes

.github/ISSUE_TEMPLATE/data_request.md


		If you need help with any of the fields below, please let us know.

		id: Identifier for the Collection that is unique across the provider. This is typically an abbreviated and hyphenated or camel-cased version of the dataset name, For example `gedi_l2a_v002` for "GEDI L2A Elevation and Height Metrics Data Global Footprint Level V002".

Collaborator

wildintellect Apr 28, 2023

Not sure we should leave this up to users. So maybe it should be clear this is a suggestion.

Collaborator

rtapella May 30, 2023

Agree w/ Alex here. As the ID needs to be unique, we have to manage it within the catalog and unless we have a feedback loop of which IDs are valid, it will be difficult to have a scientist set the ID.
Is it unique for a "provider" ? Most recent provider (as Provider is a list)? Or is it unique within the catalog?

wildintellect reviewed

View reviewed changes

.github/ISSUE_TEMPLATE/data_request.md


		_We will reuse the Dataset Description from the first section if not otherwise indicated._

		spatial extent: A bounding box for the potential spatial extents covered by the collection. Read more in the [spatial-extent-object section of the spec](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#spatial-extent-object).

Collaborator

wildintellect Apr 28, 2023

Since Spatial Extent and Temporal Extent are the most important can we move these to to the top of list?

Collaborator

rtapella May 30, 2023

Also the "additional information" section at the bottom mentions spatial and temporal info again

wildintellect reviewed

View reviewed changes

.github/ISSUE_TEMPLATE/data_request.md


		_Please provide the temporal extent as an interval. If the dataset has a single date time or is currently ongoing, a single date is appropriate._

		links: A list of references to other documents. There must be a at least one link and we highly recommend and may require in the future link to documentation which includes details about how to access and open the data.

Collaborator

wildintellect Apr 28, 2023

How is this different from
URL or DOI to dataset description in the top section?

wildintellect reviewed

View reviewed changes

.github/ISSUE_TEMPLATE/data_request.md


		_We can create an id from the Dataset Name from the first section if not otherwise indicated._

		title: A short descriptive one-line title for the collection. Technically, this is not requried by the STAC Spec but it is used by the STAC browser.

Collaborator

wildintellect Apr 28, 2023

Title/Description - why don't we just use what's above like the statement and skip asking again?

wildintellect reviewed

View reviewed changes

.github/ISSUE_TEMPLATE/data_request.md

@@ @@ -29,6 +29,36 @@ assignees: freitagb, wildintellect @@
               **Intended science use case**
               *Please describe how you intend to use the data, or the expected relevance to MAAP users.*
+              ## Spatio-Temporal Asset Catalog (STAC)

Collaborator

wildintellect Apr 28, 2023

Does this need to be a separate section? Why does the user need to know that it's specifically for STAC?

Collaborator

rtapella commented Apr 28, 2023

There are a lot of questions here that could be helped by some usability testing and some observation. Maybe we should plan some soon. At least from two angles off the top of my head:
1- adding your data to STAC (what are the barriers to submission; where do they need guidance or UI feedback (e.g. picking a unique ID), is it clear what the input fields are, do ppl need help with the API etc. and potentially some pointers at how to lower the barriers to submission)
2- finding your data in STAC (i.e., which key ways do ppl organize things in their heads -> which fields are necessary)

If it's easy to update/edit metadata maybe there's a quick workflow to just get data in, and then ppl can go back later to add in more extensive metadata.

Collaborator

wildintellect commented May 26, 2023

The Barrier right now is that the Data Team has to add the Collection/Items, we don't have a user facing method yet.
The Collection Shortname aka ID is probably the most common way people will find the data, second by the Title which is the Human readable version in STAC Browser

It's not super easy to get external data in at this time without Data Team help. Metadata is a bit of hassle to update. But I think this is premature. The likely scenarios:

The request is rejected because the data is accessible without importing to STAC, instructions can be provided on how to use the data with MAAP
Request accepted, user will be responsbile to get the files into MAAP workspace bucket and notify data team to ingest to STAC
Data Team handles the whole thing

in 2 and 3 Data Team will then likely request additional information to ensure good STAC records.

@abarciauskas-bgse are we going ahead with these changes? should someone else pick up this task? I'd like to get this PR solved so we can convert to the new Github Forms before the June 12 UWG meeting.

Collaborator

rtapella commented May 30, 2023 •

edited

Loading

[I guess this IS the form. I need to go read it again. :p ] This form lets us pilot UI with a manual back-end. Manual responses can be tested with the form, so we can release quickly and iterate (e.g. confirmation of receipt email, instructions on staging the data so it can be ingested, notification that your data set is ingested, pointers to go find your dataset once it's in STAC, etc.).
I am a bit skeptical that (this is my assumption!) (at least in a "long term" scenario) a collection ID would be the most common way to find a data set. In the case of a collaboration or when someone shares code or tells you the ID, that makes sense. I know we have some metrics on searches that can help, but I don't know whether they cover the various ways to find data. I also don't know if we are able to pull apart searches done by people using the Tutorials or copied code vs. self-directed exploratory research. If you are searching for data sets that fit your particular science objectives it seems more likely that spatial and temporal extents, plus the measurement types, would be more likely. There are also other steps in choosing a data set (or a data subset) like previewing it and then filtering it up/down that are very common. This is the type of thing I'm referring to. We could proxy a usability test with Worldview or some other site that has multiple geospatial data sets in it.

Collaborator

rtapella commented May 30, 2023

okay I need to go re-review the form. ignore my #1 above :D

Collaborator

rtapella commented May 30, 2023

In terms of my #2... a thumbnail of some sort might also be helpful, but certainly not required. It looks like this would be in the Collection assets

wildintellect mentioned this pull request

improve doc on data request form MAAP-Project/maap-documentation#339

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet