Description
Per the documentation in the README
:
The collections dictionary provides a collection ID and JSONPath pattern for matching against STAC Items. At the end of processing, before the final STAC Items are returned, the Task class can be used to assign all of the Items to specific collection IDs. For each Item the JSONPath pattern for all collections will be compared. The first match will cause the Item's Collection ID to be set to the provided value.
This sounds fine except that dictionaries are maps in json and do not have any guarantees about order preservation, i.e., maps in the json spec are considered unordered.
Best practices typically suggest the use of arrays where ordering is meaningful, and maps where uniqueness is required. In this case ordering is meaningful and mandating collection name uniqueness could be problematic for some use cases (think cases where multiple patterns might be used to check for and assign collection membership to a single collection). So it seems like collections should be an array of collections-matching objects (CollectionsMatcher
s?).
I'd propose this "CollectionMatcher
" object at minimum contain a type
and collection_name
property. The type
would be used to resolve a matcher from a discriminated union of supported matchers. To start we'd support only one type, jsonpath
, which also requires a pattern
property. With this idea the example from the README
becomes:
"collections": [
{
"type": "jsonpath",
"pattern": "$[?(@.id =~ 'LC08.*')]",
"collection_name": "landsat-c2l2"
}
]