Skip to content

Commit

Permalink
Merge pull request #1 from eoap/develop
Browse files Browse the repository at this point in the history
Application Package - Prepare a module covering the stage-in and stage-out concepts including hands-on exercises
  • Loading branch information
fabricebrito authored Oct 8, 2024
2 parents c4cfd1e + 3f64ac8 commit 5e75547
Show file tree
Hide file tree
Showing 21 changed files with 2,888 additions and 31 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ RUN apt-get update -y \
&& apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl sudo git nodejs wget curl git-flow vim gpg
curl sudo git nodejs wget curl git-flow vim gpg graphviz

RUN echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME
Expand Down
6 changes: 5 additions & 1 deletion .devcontainer/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,8 @@ dependencies:
- scikit-image
- loguru
- mkdocs-material
- mkdocs-mermaid2-plugin
- mkdocs-mermaid2-plugin
- boto3
- rio-stac
- cwltool
- graphviz
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
LC09*
catalog.json
S2A*
S2B*
S2B*
*.yaml
1 change: 1 addition & 0 deletions __about__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
version="0.0.1"
25 changes: 25 additions & 0 deletions docs/application-data-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Application data flow management

## Stage-in

From the OGC Best Practice for Earth Observation Application Package:

**An Application input argument that requires staged EO product files SHALL be defined as an argument that points to a folder where a STAC Catalog, named catalog.json, contains a list of one or more STAC Items and associated STAC Assets referencing the files.**

This translates to:

* A platform running this application will plug a **stage-in step** for all workflow steps having inputs of type `Directory`
* Workflow steps having inputs of type `Directory` will find a STAC catalog.json file

## Stage-out

From the OGC Best Practice for Earth Observation Application Package:

**An Application that creates EO product files to be stage-out SHALL generate a valid STAC Catalog, named catalog.json, and include the STAC Item(s) and corresponding STAC Assets pointing to the results of the processing.**

**The STAC Catalog created by the Application SHALL include metadata elements for each STAC Item with at least their spatial (geometry, box) and temporal (datetime) properties.**

This translates to:

* Workflow steps that have an output of type `Directory` produce a STAC catalog
* A platform running this application will plug a **stage-out step** for all workflow outputs of type Directory
43 changes: 43 additions & 0 deletions docs/definitions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Definitions

The _Best Practice for Earth Observation Application Package_ addresses data flow management of the input and output EO Products files by defining rules for the data stage-in and data stage-out for Applications that require staged files and/or generate files that need to be staged-out.

## Data stage-in definition

Data stage-in is the process to retrieve the inputs and make these available for the processing. Processing inputs are provided as catalogue references and the Platform is responsible for translating those references into inputs available as files for the local processing.

## Data stage-out definition

Data stage-out is the process to upload the output files generated by the processing onto external system(s), and make them available for later usage. The Platform retrieves the processing outputs and automatically stores them onto an external persistent storage. Additionally, the Platform should publish the metadata of the outputs onto a Catalogue and provide their references as an output.

## Application Data Flow Management

The Application data flow management relies on the rules:

* The computational workflow data interfaces use the Spatio Temporal Asset Catalog (STAC) to describe the **EO data inputs** and **generated results**

* Stage-in
* All input parameters of the CWL `ComandLineTool` that require the staging of EO products shall be of type `Directory`.
* All input parameters of the CWL `Workflow` that require the staging of EO products shall be of type `Directory`.
* Applications find a STAC `catalog.json` file

* Stage-out
* Applications produce a STAC `catalog.json` in all outputs of type Directory
* The outputs field of the `Workflow` that requires the stage-out of the generated products shall be of type `Directory`.

## Platform Data Flow Management

A Platform is responsible for the data flow management by using a local catalogue encoded using the SpatioTemporal Asset Catalog (STAC) specification as a data manifest for application inputs and outputs.

The local catalogue provides knowledge about the input and output files data contents like spatial footprint, sub-items (e.g. masks, bands) and additional metadata.

### Wrapping the Application Package

Wrap an Application Package:
* plug a stage-in step for all workflow inputs of type Directory
* plug a stage-out step for all workflow outputs of type Directory

The outcome is a wrapped CWL workflow that takes:
* the application package parameters
* any stage-in/stage-out parameters the platform may need to perform these operations

20 changes: 16 additions & 4 deletions docs/hands-on.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,21 @@
# Hands-on

## Data stage-in hands-on
## Reading data staged-in

Open notebook _01 EO Products as Input Data_ to run a hands-on exercise on data stage-in
Open notebook _01 EO Products as Input Data_ to run a hands-on exercise on reading staged data

## Data stage-out hands-on
## Inspecting results to be staged-out

Open notebook _02 EO Products as Output Data_ to run a hands-on exercise on data stage-out
Open notebook _02 EO Products as Output Data_ to run a hands-on exercise on inspecting results to be staged-out

## Running indiviudal command-line tools for data stage-in, application and results stage-out

Open notebook _03 Platform Data Management CLI_ to run indiviudal command-line tools for data stage-in, application and results stage-out

## Running indiviudal command-line tools for data stage-in, application and results stage-out with CWL

Open notebook _04 Platform Data Management CWL_ to run indiviudal command-line tools for data stage-in, application and results stage-out with CWL

## Wrapping an Application Package using EOEPCA's cwl-wrapper

Open notebook _05 Platform Data Management Application Wrapping_ to wrap an Application Package using EOEPCA's cwl-wrapper
14 changes: 1 addition & 13 deletions docs/data-flow-management.md → docs/platform-data-flow.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,4 @@
# Data Flow Management

The _Best Practice for Earth Observation Application Package_ addresses data flow management of the input and output EO Products files by defining rules for the data stage-in and data stage-out for Applications that require staged files and/or generate files that need to be staged-out.

## Data stage-in definition

Data stage-in is the process to retrieve the inputs and make these available for the processing. Processing inputs are provided as catalogue references and the Platform is responsible for translating those references into inputs available as files for the local processing.

## Data stage-out definition

Data stage-out is the process to upload the output files generated by the processing onto external system(s), and make them available for later usage. The Platform retrieves the processing outputs and automatically stores them onto an external persistent storage. Additionally, the Platform should publish the metadata of the outputs onto a Catalogue and provide their references as an output.

## Platform data flow management
# Platform data flow management

For the **data stage-in**, the Platform creates a local STAC Catalog with a STAC Item whose Assets have an accessible href (either local or remote e.g. COG) as the input files manifest for the application.

Expand Down
5 changes: 4 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,10 @@ extra_javascript:

nav:
- Introduction: 'index.md'
- Data flow management: 'data-flow-management.md'
- Data flow management:
- Definitions: 'definitions.md'
- Application data flow: 'application-data-flow.md'
- Platform data flow: 'platform-data-flow.md'
- Hands-on: 'hands-on.md'

copyright: <a href="https://img.shields.io/badge/License-CC_BY--SA_4.0-lightgrey.svg">License CC BY-SA 4.0</a>, by <a href="https://creativecommons.org/licenses/by-sa/4.0/">Creative Commons</a>
20 changes: 10 additions & 10 deletions notebooks/01 EO Products as Input Data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -34,7 +34,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -69,7 +69,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 3,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -104,7 +104,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -118,7 +118,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -157,7 +157,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -175,7 +175,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand All @@ -200,7 +200,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -233,7 +233,7 @@
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": 9,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -299,7 +299,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.12.5"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 5e75547

Please sign in to comment.