Skip to content

Collections should be an array? #124

Open
@jkeifer

Description

@jkeifer

Per the documentation in the README:

The collections dictionary provides a collection ID and JSONPath pattern for matching against STAC Items. At the end of processing, before the final STAC Items are returned, the Task class can be used to assign all of the Items to specific collection IDs. For each Item the JSONPath pattern for all collections will be compared. The first match will cause the Item's Collection ID to be set to the provided value.

This sounds fine except that dictionaries are maps in json and do not have any guarantees about order preservation, i.e., maps in the json spec are considered unordered.

Best practices typically suggest the use of arrays where ordering is meaningful, and maps where uniqueness is required. In this case ordering is meaningful and mandating collection name uniqueness could be problematic for some use cases (think cases where multiple patterns might be used to check for and assign collection membership to a single collection). So it seems like collections should be an array of collections-matching objects (CollectionsMatchers?).

I'd propose this "CollectionMatcher" object at minimum contain a type and collection_name property. The type would be used to resolve a matcher from a discriminated union of supported matchers. To start we'd support only one type, jsonpath, which also requires a pattern property. With this idea the example from the README becomes:

"collections": [
    {
        "type": "jsonpath",
        "pattern": "$[?(@.id =~ 'LC08.*')]",
        "collection_name": "landsat-c2l2"
    }
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions