Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Processes vs. openEO elephant in the spec #8

Open
jerstlouis opened this issue Apr 27, 2024 · 2 comments
Open

The Processes vs. openEO elephant in the spec #8

jerstlouis opened this issue Apr 27, 2024 · 2 comments

Comments

@jerstlouis
Copy link
Member

jerstlouis commented Apr 27, 2024

From Testbed 18 GDC ER Critical Feedack:

  1. Currently, the user needs to select which way of processing method to follow: the OGC API - Processes or the openEO approach

Ideally the "user" should not have to deal with that -- the client / tools should use whatever they and the server support automatically, without the user being aware of the internal details.

The GDC API as it stands being basically two completely different APIs is the elephant in the room ;)

In my view, the combination of OGC API - Coverages Part 1 & 2 and OGC API - Processes Part 1,2,3 can do everything openEO does and more, but since it has so much flexibility, a particular profile would be needed to allow a client to expect a consistent set of things to be implemented (e.g., scaling, subsetting, mathematical functions and operations either as processes and/or as CQL2 expressions).

It should also be possible to write a façade on top of openEO-platform to implement such a profile of those APIs, and implementing openEO as a façade on top of OGC APIs is likely also possible.

It may be possible for some implementations to support both openEO and the OGC API approach at the same end-point, assuming any remaining end-point conflicts are resolved through content negotiation by defining new media-type, with consistency in how OGC API - Common "Collections" is supported.

However, the only way to truly end up with a "single API" is if the GeoDataCube picks one approach, at least for the "Core Data Access" and "Core Processing" functionality.

In reality, except for the "User-defined functions" (which would require OGC API - Processes Part 1,2,3), I believe most of what can be done with openEO could be done strictly with OGC API - Coverages - Part 1 & 2 (with joinCollections=, filter=, properties=, CQL2 and Well-Known functions).

Given that openEO by itself is likely to move ahead as a Community Standard, it seems to me that the Core GDC API should really focus on profiling OGC API - Coverages and OGC API - Processes to provide equivalent (and additional) capabilities.

The ability to use openEO Process Graphs as a potential representation of a Workflow, as drafted in Section 13 - openEO Process Graph Workflow of Processes - Part 3 still feels like the key potential integration point for workflows integrating openEO components within a workflow.

See also #1 (comment) and Testbed 19 GDC ER Section 4.1 Profiles proposal by Ecere for a proposal of how a GeoDataCube can be accessed regardless of how (e.g., openEO process graph, OGC API - Processes execution request, WCPS...) the definition of a workflow generating it is defined.

@m-mohr
Copy link

m-mohr commented Jun 24, 2024

OGC API - Processes has nothing in the specification that makes it specific to data cubes. openEO has. If anything is dropped, OGC API - Processes should be dropped from the GDC API.

@jerstlouis
Copy link
Member Author

jerstlouis commented Jun 24, 2024

@m-mohr

OGC API Processes - Part 3: Workflows "Collection Input" / "Collection Output", combined with data access mechanisms such as OGC API - Coverages, allow to take data cubes as inputs, do some processing on these data cubes, and return a data cube as output.

There is quite a lot of tension between different approaches starting from multiple specifications and trying to come up with a single one, and it will not be easy to come to an agreement.

As per #10 , what I hope we can agree on to move forward (with both Testbed 20 and the GDC API standard) is the following:

  • A simple data access mechanism to describe a GeoDataCube and retrieve the relevant portion of interest from a data cube (Area/Time/Resolution/Fields of interest) -- In Testbed 17 and 19 GDC activities, OGC API - Coverages with Core + Scaling, Subsetting and Field Selection has been proven in multiple TIEs to achieve this quite well. This does not exclude the possibility of using other data access mechanisms (DGGS, EDR, Tiles...), but provides a simple baseline that all clients could rely upon.
  • Accept that different communities prefer to define processing workflows in a particular way (openEO process graphs, CWL, OGC API - Processes execution request, WCPS...)
  • Agree on how any of the workflow definition supported by a particular implementation can be submitted in a consistent way to an end-point for ad-hoc registration of a workflow resulting in a virtual GeoDataCube (or multiple GeoDataCubes) -- presented as an OGC API collection and/or OGC API landing page, where processing is done on demand using the aforementioned data access mechanism (triggering processing on-demand). This ad-hoc submission of a workflow would ideally use the QUERY HTTP method as it is not expected to create a resource discoverable by others, but since QUERY is not yet defined we're using POST with a 303 re-direct for this purpose.
  • Currently with Processes, this is defined to be /processes/{processId}/execution?resonse=collection but this seems problematic with CWL and openEO. So we could use another end-point like /jobs?response=collection.
  • If we want outputs where a collection / landing page do not make much sense (e.g., 42 as the output), then perhaps it should also be possible to submit an ad-hoc workflow at /jobs that directly returns processing outputs and/or create async jobs, just like POSTing to /processes/{processId}/execution does with Processes - Part 1.

If we can agree to the above:

  • It should be fairly easy for everyone to implement this on the server side (whether internally using openEO, CWL, OGC API - Processes, or WCPS),
  • It should be fairly easy to allow in each of these workflow definitions the possibility to integrate workflows using any of the other approaches as a component of a larger workflow -- the workflow definitions just need a way of supporting a "GDC input" which retrieves the portions of interest using OGC API - Coverages requests, and the ability to post the workflow intended for other server that will create the virtual GDC,
  • It should be easy for clients to access relevant portions of any GeoDataCube implementing this approach, whether it's a pre-processed GDC or a virtual one resulting from such a workflow (the only difference with accessing an OGC API - Coverages implementation is a POST of the workflow definition instead of GET /collections/{collectiondId} -- GDAL/QGIS supports this already in OGC API driver)
  • Only clients building the workflow need to understand the workflow definition, and workflows produced by these clients can then easily be shared with any visualization clients which can simply submit the workflow and access the resulting GDC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants