From 6ffe40cadcaaa78b4e28caa43eafd57142ac9db1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mirko=20M=C3=A4licke?= <mirko@maelicke-online.de>
Date: Mon, 11 Dec 2023 15:44:59 +0100
Subject: [PATCH 1/4] remove unnecessary sections

---
 docs/input.md | 72 +++++++++++++++++----------------------------------
 1 file changed, 24 insertions(+), 48 deletions(-)

diff --git a/docs/input.md b/docs/input.md
index cca9e97..d02937b 100644
--- a/docs/input.md
+++ b/docs/input.md
@@ -46,7 +46,7 @@ if the parameterization of a tool can be applied to other data. That means, the
 
 From a practical perspective, if you build a tool around these tool specifications,
 the tool name and content of the sections `parameters` and `data` of `/in/input.json` 
-can be used to create checksums and therefor help to establish reproducible workflows.
+can be used to create checksums and therefore help to establish reproducible workflows.
 
 
 ## Parameters: File specification
@@ -147,7 +147,9 @@ Note, that default parameters are only parsed if they are not set as `optional=t
 ## Data: File specification
 
 All input `Data` is described in a data block in the `/src/tool.yml` file.
-All sets of input data are collected as the **optional** `tools.<tool_name>.data` block:
+All sets of input data are collected as the **optional** `tools.<tool_name>.data` block.
+The simples declaration of input data is to list all available data files in a
+single, top-level list:
 
 ```yaml
 tools:
@@ -155,59 +157,36 @@ tools:
     parameters:
       [...]
     data:
-      foo_data:
-        [...]
+      - foo_data
+      - foo_data2
 ```
 
-Refer to the section below to learn about mandatory and optional fields for `Data`.
-
-
-### Fields
-
-The following section defines all mandatory and optional fields of a `Data` entity.
-
-#### `load`
-
-This is the only **mandatory** field for an entity of `Data`.  
-Boolean field which defaults to `true`. If set to `load=false`, the file is not parsed by the 
-library used for parsing input. In this case, file paths are passed as ordinary strings and 
-the parsing library will not attempt to load the file.
-
-There are a number of file formats, which are loaded by default:
-
-
-| file extension | Python |  R  |  Matlab |  NodeJS  |
-| ---------------|--------|-----|---------|----------| 
-| .dat  |  `numpy.array` | `vector` | `matrix`  | `number[][]` | 
-| .csv  |  `pandas.DataFrame` | `data.frame` |  `matrix` |  `number[][]` |
-
-
-Note that setting `load=false` can be helpful when developing tools that require to load the
-data in a different way than it is provided by the parsing libraries.
-
-#### `extension`
-
-By default, the file format is derived from the file extension given in the path to the data
-in `input.json`. Via the `extension` field, it is possible to override the file format of input 
-data. This way, it can be ensured that the library used for parsing the input always loads the
-file in the respective datastructure to the tool.  If the file format / extension is not 
-supported by the parsing library, file paths are passed just as strings, the parsing library 
-will not attempt to load the file (see above for supported formats).
+If any of the dataset sources requires a more detailed configuration, objects 
+can be specifies as well:
 
 ```yaml
 tools:
   foobar:
     parameters:
-      ...
+      [...]
     data:
       foo_data:
-        load: true
-        extension: .csv
+        description: Our first dataset with foo properties
+      foo_data2:
+        description: Our second dataset with foo2 properties
 ```
 
+Refer to the section below to learn about the fields for `Data`.
+
+
+### Fields
+
+The following section defines all fields of a `Data` entity.
+
+
 #### `description`
 
-The `description` is a multiline comment to describe the input data.
+The `description` is a single- or multiline comment to describe the input data.
 For the `description` Markdown is allowed, although tool-frameworks are not required to parse it.
 Descriptions are optional and can be omitted.
 
@@ -248,12 +227,9 @@ tools:
         description: An optional array of floats
     data:
       foo_csv_data:
-        load: true
-        extension: .csv
         description: |
-          The parsing library will try to load the data like .csv files,
-          regardless of the file extension.
+          This is a CSV file that should contain valid input. We do currently
+          not specify, what that exactly means.
       foo_nc_data:
-        load: false
-        description: netCDF data that is not loaded by the parsing library.    
+        description: CF-netCDF 1.8 conform climate model output.    
 ```

From 01deee899d6aec58fac6ee040e27e1eca8273e58 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mirko=20M=C3=A4licke?= <mirko@maelicke-online.de>
Date: Tue, 12 Dec 2023 10:13:12 +0100
Subject: [PATCH 2/4] Add optional data fields

---
 docs/input.md | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/docs/input.md b/docs/input.md
index d02937b..aa5d30a 100644
--- a/docs/input.md
+++ b/docs/input.md
@@ -188,7 +188,8 @@ The following section defines all fields of a `Data` entity.
 
 The `description` is a single- or multiline comment to describe the input data.
 For the `description` Markdown is allowed, although tool-frameworks are not required to parse it.
-Descriptions are optional and can be omitted.
+Descriptions are optional and can be omitted, but it is highly recommended to 
+add descriptions to all required data inputs.
 
 A multiline comment in YAML can be specified like:
 
@@ -198,6 +199,37 @@ description: |
     This is the second line
 ```
 
+#### `example`
+
+The `example` field is optional and can be used to reference a sample dataset
+for the given input, **within** the container. Data examples are a prime source 
+for your users to understand how inputs should look like and be formatted.
+
+```yaml
+example: /samples/input_name.csv
+```
+
+#### `quality`
+
+The `quality` field is an optional field, that contains various sub-fields.
+These text-based fields can be used to specify data quality requirements for the
+input data.
+The quality field can contain one or more of the child fields, but cannot be empty.
+
+```yaml
+quality:
+  completeness: Describes the expectations of present variables and measurements.
+  accuracy: | 
+    Describes if the tool has expectations of minimum required accurary.
+    This can involve measurement accuracy, but also expected scaling.
+  validity: | 
+    Describes which format requirements the tool has, to recognize the passed in
+    files as valid data inputs. 
+```
+
+There are additional dimensions to data quality, consistency, timeliness and 
+uniqueness. These dimensions do not apply here as general catgories.
+
 
 ## Example
 

From 36a9f399e06c11710db39ee1b34da20cc300cbf0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mirko=20M=C3=A4licke?= <mirko@maelicke-online.de>
Date: Tue, 12 Dec 2023 13:38:35 +0100
Subject: [PATCH 3/4] some fixes

---
 docs/input.md | 31 ++++++++-----------------------
 1 file changed, 8 insertions(+), 23 deletions(-)

diff --git a/docs/input.md b/docs/input.md
index aa5d30a..58623d4 100644
--- a/docs/input.md
+++ b/docs/input.md
@@ -148,7 +148,7 @@ Note, that default parameters are only parsed if they are not set as `optional=t
 
 All input `Data` is described in a data block in the `/src/tool.yml` file.
 All sets of input data are collected as the **optional** `tools.<tool_name>.data` block.
-The simples declaration of input data is to list all available data files in a
+The simplest declaration of input data is to list all available data files in a
 single, top-level list:
 
 ```yaml
@@ -206,30 +206,15 @@ for the given input, **within** the container. Data examples are a prime source
 for your users to understand how inputs should look like and be formatted.
 
 ```yaml
-example: /samples/input_name.csv
+example: /in/input_name.csv
 ```
 
-#### `quality`
-
-The `quality` field is an optional field, that contains various sub-fields.
-These text-based fields can be used to specify data quality requirements for the
-input data.
-The quality field can contain one or more of the child fields, but cannot be empty.
-
-```yaml
-quality:
-  completeness: Describes the expectations of present variables and measurements.
-  accuracy: | 
-    Describes if the tool has expectations of minimum required accurary.
-    This can involve measurement accuracy, but also expected scaling.
-  validity: | 
-    Describes which format requirements the tool has, to recognize the passed in
-    files as valid data inputs. 
-```
-
-There are additional dimensions to data quality, consistency, timeliness and 
-uniqueness. These dimensions do not apply here as general catgories.
-
+It is considered good practice to add example data and example parameterizaitons
+to the `/in/` folder. At inspection time, when a client application reads the 
+`tool.yml`, this client can also access the examples in the `/in/` folder.
+At runtime, as the client application mounts data and parameterizations into the
+container at `/in/`, the examples are non-existent in the container and cannot 
+accidentally pollute the runtime container.
 
 ## Example
 

From 062070861b54ba3813de00eada3fa1717dbe6000 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mirko=20M=C3=A4licke?= <mirko@maelicke-online.de>
Date: Tue, 12 Dec 2023 13:55:57 +0100
Subject: [PATCH 4/4] add the extension field

---
 docs/input.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/docs/input.md b/docs/input.md
index 58623d4..df416c9 100644
--- a/docs/input.md
+++ b/docs/input.md
@@ -216,6 +216,26 @@ At runtime, as the client application mounts data and parameterizations into the
 container at `/in/`, the examples are non-existent in the container and cannot 
 accidentally pollute the runtime container.
 
+
+#### `extension`
+
+The `extension` field is optional and can be used to limit the permitted file 
+extensions for a data input. Allowed is a single string input or a list of strings.
+By convention, the point `.` should be included into the `extension` as well.
+
+```yaml
+extension: .csv
+```
+
+```yaml
+extension:
+  - .dat
+  - .txt
+  - .DAT
+  - .TXT
+```
+
+
 ## Example
 
 ```yaml