Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ab/request stac metadata #712

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

abarciauskas-bgse
Copy link
Collaborator

I think we should request a bit more information to help us complete the STAC metadata. At a minimum, I think we should request spatial and temporal extents, since it's not always trivial to determine what this is from the files themselves.

@wildintellect
Copy link
Collaborator

I'm ok asking users to help with collecting the STAC metadata, but think we need to make it more "optional". Or at least indicate it can be filled in later in consultation with the Data Team. I'm concerned too many questions in the request will be a barrier to submission. Also do we trust users to come up with a good collection ID? or is that something the Data Team should control.


This collection will be published as a Spatio-Temporal Asset Catalog (STAC) Collection. You can read the complete STAC collection spec here: (https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md).

If you need help with any of the fields below, please let us know.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fill in any fields you know. If you don't know the answer it can be updated later in consultation with the Data Team.


If you need help with any of the fields below, please let us know.

**id:** Identifier for the Collection that is unique across the provider. This is typically an abbreviated and hyphenated or camel-cased version of the dataset name, For example `gedi_l2a_v002` for "GEDI L2A Elevation and Height Metrics Data Global Footprint Level V002".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should leave this up to users. So maybe it should be clear this is a suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree w/ Alex here. As the ID needs to be unique, we have to manage it within the catalog and unless we have a feedback loop of which IDs are valid, it will be difficult to have a scientist set the ID.
Is it unique for a "provider" ? Most recent provider (as Provider is a list)? Or is it unique within the catalog?


_We will reuse the Dataset Description from the first section if not otherwise indicated._

**spatial extent:** A bounding box for the potential spatial extents covered by the collection. Read more in the [spatial-extent-object section of the spec](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#spatial-extent-object).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Spatial Extent and Temporal Extent are the most important can we move these to to the top of list?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the "additional information" section at the bottom mentions spatial and temporal info again


_Please provide the temporal extent as an interval. If the dataset has a single date time or is currently ongoing, a single date is appropriate._

**links:** A list of references to other documents. There must be a at least one link and we highly recommend and may require in the future link to documentation which includes details about how to access and open the data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from
URL or DOI to dataset description in the top section?


_We can create an id from the Dataset Name from the first section if not otherwise indicated._

**title:** A short descriptive one-line title for the collection. Technically, this is not requried by the STAC Spec but it is used by the STAC browser.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title/Description - why don't we just use what's above like the statement and skip asking again?

@@ -29,6 +29,36 @@ assignees: freitagb, wildintellect
**Intended science use case**
*Please describe how you intend to use the data, or the expected relevance to MAAP users.*

## Spatio-Temporal Asset Catalog (STAC)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a separate section? Why does the user need to know that it's specifically for STAC?

@rtapella
Copy link
Collaborator

There are a lot of questions here that could be helped by some usability testing and some observation. Maybe we should plan some soon. At least from two angles off the top of my head:
1- adding your data to STAC (what are the barriers to submission; where do they need guidance or UI feedback (e.g. picking a unique ID), is it clear what the input fields are, do ppl need help with the API etc. and potentially some pointers at how to lower the barriers to submission)
2- finding your data in STAC (i.e., which key ways do ppl organize things in their heads -> which fields are necessary)

If it's easy to update/edit metadata maybe there's a quick workflow to just get data in, and then ppl can go back later to add in more extensive metadata.

@wildintellect
Copy link
Collaborator

@rtapella

  1. The Barrier right now is that the Data Team has to add the Collection/Items, we don't have a user facing method yet.
  2. The Collection Shortname aka ID is probably the most common way people will find the data, second by the Title which is the Human readable version in STAC Browser

It's not super easy to get external data in at this time without Data Team help. Metadata is a bit of hassle to update. But I think this is premature. The likely scenarios:

  1. The request is rejected because the data is accessible without importing to STAC, instructions can be provided on how to use the data with MAAP
  2. Request accepted, user will be responsbile to get the files into MAAP workspace bucket and notify data team to ingest to STAC
  3. Data Team handles the whole thing

in 2 and 3 Data Team will then likely request additional information to ensure good STAC records.

@abarciauskas-bgse are we going ahead with these changes? should someone else pick up this task? I'd like to get this PR solved so we can convert to the new Github Forms before the June 12 UWG meeting.

@rtapella
Copy link
Collaborator

rtapella commented May 30, 2023

@wildintellect

  1. [I guess this IS the form. I need to go read it again. :p ] This form lets us pilot UI with a manual back-end. Manual responses can be tested with the form, so we can release quickly and iterate (e.g. confirmation of receipt email, instructions on staging the data so it can be ingested, notification that your data set is ingested, pointers to go find your dataset once it's in STAC, etc.).
  2. I am a bit skeptical that (this is my assumption!) (at least in a "long term" scenario) a collection ID would be the most common way to find a data set. In the case of a collaboration or when someone shares code or tells you the ID, that makes sense. I know we have some metrics on searches that can help, but I don't know whether they cover the various ways to find data. I also don't know if we are able to pull apart searches done by people using the Tutorials or copied code vs. self-directed exploratory research. If you are searching for data sets that fit your particular science objectives it seems more likely that spatial and temporal extents, plus the measurement types, would be more likely. There are also other steps in choosing a data set (or a data subset) like previewing it and then filtering it up/down that are very common. This is the type of thing I'm referring to. We could proxy a usability test with Worldview or some other site that has multiple geospatial data sets in it.

@rtapella
Copy link
Collaborator

okay I need to go re-review the form. ignore my #1 above :D

@rtapella
Copy link
Collaborator

In terms of my #2... a thumbnail of some sort might also be helpful, but certainly not required. It looks like this would be in the Collection assets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants