Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geoDataClass parameter to input description. #474

Merged
merged 5 commits into from
Jan 5, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions core/sections/clause_8_ogc-process-description.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,26 @@ The following JSON Schema fragment illustrates the use of the `format` key to in
}
----

==== Data classes

One common input type that a process might accept is a https://docs.ogc.org/is/17-069r4/17-069r4.html#_collection_[feature collection] indicating that the process will somehow operate over the items of the collection. This implies that the process will have certain expections about the structure of the collection with regard to which properties the collection contains, their types, etc. In order to properly handle any arbitrary input collection a process would need to inspect the structure of the collection to ensure that all the expected properties are present with the expected types, etc. To alleviate the server from having to perform such a tedious, and perhaps computationally expensive, schema validation step this Standard introduces the concept of the _Data Class_ via the `dataClasses` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the "type" mentioned here refers to the geometry type (point, polygon, etc.)? I think that might lead to confusion considering "type" could be interpreted as the media-types or data-type (uint16, float32, etc.). Maybe consider using "geometries" or another word that doesn't have as much overloaded definitions as "type" does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is not limited to the type of the geometry, but the type of all the properties (integer, real, text...) and any other JSON Schema validation. The geometry type is also part of that as restriction on the "geometry" property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is my understanding of dataClass. It is a conceptual requirement that goes beyond the way the data is encoded, and allows a complicated combination of property prerequisites. This is why I want to be careful about using terms like "type" that misguide readers about the intent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmigneault type here refers to the type as defined in the schema of the data resource. That could be the data type (i.e. integer, string, object, etc.) but it could also mean the spatial type (polygon, linestring, etc.). Not sure how to say that succinctly. Perhaps I can add a note right after the information indicating that in this context "type" means type as defined by the schema but also spatial type? Would that help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it also means more, no? It is more a "concept" than a "type".
I agree it is hard to explain succinctly when the description remains abstract.
I think the extended example (where it talks about GeoJSON/Shapefile) helps the most.


The value of the `dataClasses` parameter is an array of URIs. Each URI identifies a predefined set of properties or a sub-schema. Two data resources tagged with the same data class URI value can be assumed to each contain all the properties defined by the data class. This equivalence allows a server to quickly validate that an input data resource meets the server's expections in terms of the properties available for processing, their types, etc. simply by comparing data class URI values. If the data class URI of the input data resource matches one of the data class URIs specified in the description of the process input (via the `dataClasses` parameter) then the server can be assured that the process can operate on that data resource.
pvretano marked this conversation as resolved.
Show resolved Hide resolved
pvretano marked this conversation as resolved.
Show resolved Hide resolved

The `dataClasses` parameter is an array allowing process inputs to be described that can handle a variety of data classes. As long as the data class URI associated with an input data resource matches at least one of the data class URIs listed in the `dataClasses` array the server can assume that the process can operate on that input data process.
pvretano marked this conversation as resolved.
Show resolved Hide resolved

This clause started by introducing the concept of the _Data Class_ in relation to a https://docs.ogc.org/is/17-069r4/17-069r4.html#_collection_[feature collection] but the concept is a general one applying to feature collections, coverages, styles, etc. For example, a specific data class might be defined to include a geometry property _fenceLine_ of type _polygon_. Thus, any input feature collection tagged with this data class's URI can be expected to include a _fenceLine_ property and its type can be assumed to be _polygon_. Similarly, a data class could be defined that identifies a set of bands in a coverage, say R, G, B and NIR. Any coverages tagged with this data class's URI can thus be assumed to contain these bands.

[NOTE]
====
. As defined in <<sc-value-passing>> an input value can be passed to a process by value or by reference. Whether an input value is passed by-value or by-reference is orthogonal to the concept of the _Data Class_. In either case the server goes through the same procedure (i.e. comparing data class URIs) to determine whether a specific input value is suitable for processing as per the process description.
pvretano marked this conversation as resolved.
Show resolved Hide resolved

. A data resource tagged with a specific data class will contain all the properties defined for that class but may also contain additional properties that are not members of the class. A process expecting an input value of a particular data class value would simply ignore these extraneous properties.

. A data resource can be tagged with more that one data class URI.
pvretano marked this conversation as resolved.
Show resolved Hide resolved

. In order for the _Data Class_ concept to be most effective a registry aking to that found at https://schema.org[Schema.org] would need to be created and maintained. The OGC definition server is likely the best place to define and manage _Data Class_ URIs.
pvretano marked this conversation as resolved.
Show resolved Hide resolved
====

==== Cardinality

Expand Down Expand Up @@ -289,6 +309,7 @@ In this case we have an input with cardinality greater than 1 but that has value
|===


[[sc-value-passing]]
==== Value passing

include::../requirements/ogc-process-description/REQ_value-passing.adoc[]
Expand Down
5 changes: 5 additions & 0 deletions openapi/schemas/processes-core/inputDescription.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ allOf:
properties:
schema:
$ref: "schema.yaml"
dataClasses:
type: array
items:
type: string
format: uri
minOccurs:
type: integer
default: 1
Expand Down
Loading