Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large scale processing (Synchronous, Asynchronous, On-demand) #9

Open
jerstlouis opened this issue Apr 27, 2024 · 0 comments
Open

Large scale processing (Synchronous, Asynchronous, On-demand) #9

jerstlouis opened this issue Apr 27, 2024 · 0 comments

Comments

@jerstlouis
Copy link
Member

jerstlouis commented Apr 27, 2024

From Testbed 19 GDC ER Critical Feedback:

  1. Supporting both synchronous and asynchronous processing is good, to support both prototype development and large scalable processing.

In OGC API - Processes - Part 3: Workflows, we define the "Collection Input" and "Collection Output" requirement classes, which I strongly recommend be part of the "Core Processing" profile (see ER Section 4.1 Profiles proposal by Ecere).
What they allow is to efficiently access data resulting from processing, the exact same way as a regular static preprocessed GeoDataCube, and are completely suitable for large scalable processing, as an alternative to asynchronous "batch" processing which is much easier to manage. It completely avoids the need for job management, estimation, etc., instead relying on clients requesting small bits at a time as needed, and making it easier to prioritize clients. All this is explained in details in the Section 6.2 Design Goals of Processes - Part 3.
At least 3 participants in Testbed 19 experimented and successfully implemented either (or both) Collection Input or Collection Output (Ecere, Compusult, Wuhan University).

As a concept, "Collection Input" is similar to the openEO LoadCollection process, but the collection does not need to have been produced before you can reference and use it.
It's all done on-the-fly. Similarly, the "Collection Output" is somewhat similar to the STAC output, but rather than relying on STAC items to access output assets (which means accessing the whole thing unless you have a cloud based format like COG or Zarr), it relies on OGC APIs such as OGC API - Coverages.
Making a request from the API (which can be a subset) generates only what is required, can pull whatever it needs from the workflow, triggering processing as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant