`hxl-yml-spec-to-hxl-json-spec`: HXL Data processing specs exporter #14

fititnt · 2021-03-12T23:42:18Z

Quick links:

"JSON processing specs for HXL data, David Megginson, 2021-03-11"
- https://docs.google.com/presentation/d/17vXOnq2atIDnrODGLs36P1EaUvT-vXPjsc2I1q1Qc50/edit#slide=id.p
Test online
- https://proxy.hxlstandard.org/api/from-spec.html

Let's do an proof of concept of the thing!

…ec added (draft)

fititnt · 2021-03-13T00:28:09Z

Maybe this

hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

and an file like this

- hsilo: "test1"
  hrecipe:
    - id: recipe1
      source:
        - iri: https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0
          filters:
            - filter: with_columns
              with_columns: "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
            - filter: without_rows
              without_rows: "#vocab+code+v_6391="

could be a good starting point. But form my experience with Ansible (and very, very large Ansible playbooks) we could do from start allow parsing several YAML files at once and just output all the json specs line by line.

But the come to this point, the hdpcli needs to implement some way to at least concatenate more than one YAML file. (the part about include_file options may be something for later).

…P class

…_init, HDP._safer_zone_hosts, HDP._safer_zone_list, HDP.export_schema_json(), HDP.export_yml()

…& HDP.HDP_YML_EXTENSIONS

… now works

… works

fititnt · 2021-03-13T14:36:39Z

This is the yaml file (there is some extra markup, but ignore for now).

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | grep "^[^#;]"

---
- hsilo:
    name: "test1"
    desc: from https://docs.google.com/presentation/d/17vXOnq2atIDnrODGLs36P1EaUvT-vXPjsc2I1q1Qc50/
  hrecipe:
    - id: example-processing-with-a-JSON-spec
      iri_example:
        - iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
          sheet_index: 1
      recipe:
        - filter: count
          patterns: "adm1+name,adm1+code"
          aggregators:
            - "sum(population) as Population#population"
        - filter: clean_data
          number: "population"
          number_format: .0f

This is the json spec result

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

{
    "input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
    "recipe": [
        {
            "aggregators": [
                "sum(population) as Population#population"
            ],
            "filter": "count",
            "patterns": "adm1+name,adm1+code"
        },
        {
            "filter": "clean_data",
            "number": "population",
            "number_format": ".0f"
        }
    ],
    "sheet_index": 1
}

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | hxlspec

ERROR (hxl.io): Skipping column(s) with malformed hashtag specs: #
Gov,Gov Pcode,Population
#adm1+name,#adm1+code,#population
Abyan,YE12,618892
Ad Dali',YE30,818507
Aden,YE24,1053455
Al Bayda,YE14,795107
Al Hodeidah,YE18,2996334
Al Jawf,YE16,633596
Al Maharah,YE28,175606
Al Mahwit,YE27,770920
Amran,YE29,1221908
Dhamar,YE20,2194159
Hadramawt,YE19,1551347
Hajjah,YE17,2630678
Ibb,YE11,3143818
Lahj,YE25,1076296
Ma'rib,YE26,1086663
Raymah,YE31,562930
Sa'dah,YE22,934201
Sana'a,YE23,1370798
Sana'a City,YE13,3296342
Shabwah,YE21,676408
Socotra,YE32,69004
Ta'iz,YE15,3104579

And actually, redirecting to command line hxlspec actually worked. Just a quick warning, but worked!

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ pip3 show libhxl | grep Version

Version: 4.22

fititnt · 2021-03-14T16:45:13Z

We need some way to make 'inline' data tables that could work an some way to test if an HXL Data processing specs is working (and needs to work offline).

This implies add some new attributes, in special the concept of inline data and expected result data. Or maybe the concept of 'example'.

…fy more than one source; also content without translation will be prefixed with a single _

fititnt · 2021-03-14T21:38:45Z

Current example

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

# yaml-language-server: $schema=https://raw.githubusercontent.com/EticaAI/HXL-Data-Science-file-formats/main/hxlm/core/schema/hdp.json-schema.json

# How to run this file? Version tested: v0.7.4
# @see https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/14#issuecomment-798454298

# To inspect the result (pretty print)
#     hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml
# To pipe the result direct to hxlspec (first item of array, use jq '.[0]')
#     hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[0]' | hxlspec
# To pipe the result direct to hxlspec (first item of array, use jq '.[1]')
#     hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[1]' | hxlspec

---

# See also https://proxy.hxlstandard.org/api/from-spec.html
# http://json-schema.org/understanding-json-schema/
# Test schema online https://www.jsonschemavalidator.net/
# Validate schema here: https://www.json-schema-linter.com/
# TODO: better validate HERE https://jsonschemalint.com/#!/version/draft-07/markup/json

- hsilo: "test1"
  hrecipe:
    - id: recipe1
      _recipe:
        - filter: with_columns
          includes: "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
        - filter: without_rows
          queries: "#vocab+code+v_6391="
      exemplum:
        - fontem:
            iri: https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0

- hsilo: 
    nomen: "test1"
    descriptionem: from https://docs.google.com/presentation/d/17vXOnq2atIDnrODGLs36P1EaUvT-vXPjsc2I1q1Qc50/
  hrecipe:
    - id: example-processing-with-a-JSON-spec
      _recipe:
        - filter: count
          patterns: "adm1+name,adm1+code"
          aggregators:
            - "sum(population) as Population#population"
        - filter: clean_data
          number: "population"
          number_format: .0f
      exemplum:
        - fontem:
            iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
            _sheet_index: 1

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

[
    {
        "input": "https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0",
        "recipe": [
            {
                "filter": "with_columns",
                "includes": "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
            },
            {
                "filter": "without_rows",
                "queries": "#vocab+code+v_6391="
            }
        ]
    },
    {
        "input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
        "recipe": [
            {
                "aggregators": [
                    "sum(population) as Population#population"
                ],
                "filter": "count",
                "patterns": "adm1+name,adm1+code"
            },
            {
                "filter": "clean_data",
                "number": "population",
                "number_format": ".0f"
            }
        ]
    }
]

…input_data', 'output_data', as part of hrecipe.exemplum; the underlining inplementation still not ready, but the idea is be able to specify self-contained example when creating recipes with YAML; the hrecipe.exemplum[N]objectivum.datum can be used for self-contained testing!

fititnt · 2021-03-14T23:22:14Z

Now the hdpcli --export-to-hxl-json-processing-spec, to generate the input parameter specified by the HXL data processing specs, should be as first item of an array. If using the internal language, this means put in something like hrecipe.[0].exemplum.[0].fontem.iri instead of hrecipe.[0].iri_example.[0].iri.

The idea of use 'exemplum' is because if one goal of recipes would be reusability, this means that any input data there would be... just as example/reference.

The inpact of this is that now, the first item when exporting will always be without example inputs, but the second one would be like before.

Both 'input_data' / 'output_data' actually are one way to express, as inline data, input data (to not use external link) and 'output_data' is if eventually we implement some way to use an recipe to be able to be tested on different proxies.

Also, the idea of 'input_data' / 'output_data', even if ignored by HXL data processing specs, can be used to just looking at one YAML file have an idea of what the recipe would do. (ok that the idea is actually test if really works, but at least for documentation it already serve!)

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats/tests/hrecipe$ cat hello-world.hrecipe.hdp.yml

# cd tests/hrecipe
# hdpcli --export-to-hxl-json-processing-specs hello-world.hrecipe.hdp.yml
# hdpcli --export-to-hxl-json-processing-specs hello-world.hrecipe.hdp.yml | jq '.[1]' | hxlspec
---
- hsilo:
    nomen: hello-world.hrecipe.hdp.yml
    linguam: mul # https://iso639-3.sil.org/code/mul
  hrecipe:
    - id: example-processing-with-a-JSON-spec
      _recipe:
        - filter: count
          patterns: "adm1+name,adm1+code"
          aggregators:
            - "sum(population) as Population#population"
        - filter: clean_data
          number: "population"
          number_format: .0f
      # iri_example:
      #   - iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
      #     sheet_index: 1
      exemplum:
        # Example one
        - fontem:
            iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
            _sheet_index: 1

        # Example two includes both an inline data
        - fontem:
            # Note: fontem.datum not fully implemented. But the idea here is
            #       be able to create an ad-hoc table instead of use
            #       external input. So help show as quick example or...
            #       as some sort of unitary test for an HXL data processing
            #       spec!
            datum:
              - ["header 1", "header 2", "header 3"]
              - ["#item +id", "#item +name", "#item +value"]
              - ["ACME1", "ACME Inc.", "123"]
              - ["XPTO1", "XPTO org", "456"]
          objectivum:
            # Note: fontem.objectivum not fully implemented. But the idea here
            #       is (like the fontem.datum) work as ad-hoc table, but is
            #       really allow create some sort of unitary test for a HXL
            #       data processing spec!
            datum:
              - ["header 1", "header 2", "header 3"]
              - ["#item +id", "#item +name", "#item +value"]
              - ["ACME1", "ACME Inc.", "123"]
              - ["XPTO1", "XPTO org", "456"]

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats/tests/hrecipe$ hdpcli --export-to-hxl-json-processing-specs hello-world.hrecipe.hdp.yml

[
    {
        "recipe": [
            {
                "aggregators": [
                    "sum(population) as Population#population"
                ],
                "filter": "count",
                "patterns": "adm1+name,adm1+code"
            },
            {
                "filter": "clean_data",
                "number": "population",
                "number_format": ".0f"
            }
        ]
    },
    {
        "input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
        "recipe": [
            {
                "aggregators": [
                    "sum(population) as Population#population"
                ],
                "filter": "count",
                "patterns": "adm1+name,adm1+code"
            },
            {
                "filter": "clean_data",
                "number": "population",
                "number_format": ".0f"
            }
        ],
        "sheet_index": 1
    },
    {
        "input_data": [
            [
                "header 1",
                "header 2",
                "header 3"
            ],
            [
                "#item +id",
                "#item +name",
                "#item +value"
            ],
            [
                "ACME1",
                "ACME Inc.",
                "123"
            ],
            [
                "XPTO1",
                "XPTO org",
                "456"
            ]
        ],
        "output_data": [
            [
                "header 1",
                "header 2",
                "header 3"
            ],
            [
                "#item +id",
                "#item +name",
                "#item +value"
            ],
            [
                "ACME1",
                "ACME Inc.",
                "123"
            ],
            [
                "XPTO1",
                "XPTO org",
                "456"
            ]
        ],
        "recipe": [
            {
                "aggregators": [
                    "sum(population) as Population#population"
                ],
                "filter": "count",
                "patterns": "adm1+name,adm1+code"
            },
            {
                "filter": "clean_data",
                "number": "population",
                "number_format": ".0f"
            }
        ]
    }
]

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

# yaml-language-server: $schema=https://raw.githubusercontent.com/EticaAI/HXL-Data-Science-file-formats/main/hxlm/core/schema/hdp.json-schema.json

# How to run this file? Version tested: v0.7.4
# @see https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/14#issuecomment-798454298

# To inspect the result (pretty print)
#     hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml
# To pipe the result direct to hxlspec (second item of array, use jq '.[1]')
#     hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[1]' | hxlspec
# To pipe the result direct to hxlspec (4º item of array, use jq '.[1]')
#     hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[3]' | hxlspec

---

# See also https://proxy.hxlstandard.org/api/from-spec.html
# http://json-schema.org/understanding-json-schema/
# Test schema online https://www.jsonschemavalidator.net/
# Validate schema here: https://www.json-schema-linter.com/
# TODO: better validate HERE https://jsonschemalint.com/#!/version/draft-07/markup/json

- hsilo: "test1"
  hrecipe:
    - id: recipe1
      _recipe:
        - filter: with_columns
          includes: "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
        - filter: without_rows
          queries: "#vocab+code+v_6391="
      exemplum:
        - fontem:
            iri: https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0

- hsilo: 
    nomen: "test1"
    descriptionem: from https://docs.google.com/presentation/d/17vXOnq2atIDnrODGLs36P1EaUvT-vXPjsc2I1q1Qc50/
  hrecipe:
    - id: example-processing-with-a-JSON-spec
      _recipe:
        - filter: count
          patterns: "adm1+name,adm1+code"
          aggregators:
            - "sum(population) as Population#population"
        - filter: clean_data
          number: "population"
          number_format: .0f
      exemplum:
        - fontem:
            iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
            _sheet_index: 1

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

[
    {
        "recipe": [
            {
                "filter": "with_columns",
                "includes": "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
            },
            {
                "filter": "without_rows",
                "queries": "#vocab+code+v_6391="
            }
        ]
    },
    {
        "input": "https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0",
        "recipe": [
            {
                "filter": "with_columns",
                "includes": "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
            },
            {
                "filter": "without_rows",
                "queries": "#vocab+code+v_6391="
            }
        ]
    },
    {
        "recipe": [
            {
                "aggregators": [
                    "sum(population) as Population#population"
                ],
                "filter": "count",
                "patterns": "adm1+name,adm1+code"
            },
            {
                "filter": "clean_data",
                "number": "population",
                "number_format": ".0f"
            }
        ]
    },
    {
        "input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
        "recipe": [
            {
                "aggregators": [
                    "sum(population) as Population#population"
                ],
                "filter": "count",
                "patterns": "adm1+name,adm1+code"
            },
            {
                "filter": "clean_data",
                "number": "population",
                "number_format": ".0f"
            }
        ],
        "sheet_index": 1
    }
]

fititnt added a commit that referenced this issue Mar 13, 2021

hxl-processing-specs (#14): hdpcli --export-to-hxl-json-processing-sp…

819196d

…ec added (draft)

fititnt added a commit that referenced this issue Mar 13, 2021

hxlm(#11), hxl-processing-specs (#14): created hxlm.core.model.hdp HD…

e516c56

…P class

fititnt added a commit that referenced this issue Mar 13, 2021

hxlm(#11), hxl-processing-specs (#14): Added HDP._online_unrestricted…

10c7df8

…_init, HDP._safer_zone_hosts, HDP._safer_zone_list, HDP.export_schema_json(), HDP.export_yml()

fititnt added a commit that referenced this issue Mar 13, 2021

hxlm(#11), hxl-processing-specs (#14): Added HDP.HDP_JSON_EXTENSIONS …

cc5462f

…& HDP.HDP_YML_EXTENSIONS

fititnt added a commit that referenced this issue Mar 13, 2021

hxlm(#11), hxl-processing-specs (#14): HDP._prepare_from_local_file()…

0dc88ee

… now works

fititnt mentioned this issue Mar 13, 2021

[meta issue] hxlm #11

Closed

fititnt added a commit that referenced this issue Mar 13, 2021

hxlm(#11), hxl-processing-specs (#14): HDP._prepare_from_remote_iri()…

57de6da

… works

fititnt added a commit that referenced this issue Mar 13, 2021

hxl-processing-specs (#14): HDP.export_json_processing_specs() works

de9467a

fititnt added the proof-of-concept-already-exist Do exist proof of concept (or better) for this issue label Mar 13, 2021

fititnt added a commit that referenced this issue Mar 14, 2021

core_vocab attr.exemplum added (see #14 (comment))

7c367b3

fititnt added a commit that referenced this issue Mar 14, 2021

i18n+l10n (#15), hxl-processing-specs (#14): now is possible to speci…

57d2cca

…fy more than one source; also content without translation will be prefixed with a single _

fititnt added a commit that referenced this issue Mar 15, 2021

hxl-processing-specs (#14): added tests

ed6dcc8

fititnt added the data-transformation https://en.wikipedia.org/wiki/Data_transformation label Mar 28, 2021

fititnt changed the title ~~hxl-yml-spec-to-hxl-json-spec~~ hxl-yml-spec-to-hxl-json-spec: HXL Data processing specs exporter Apr 2, 2021

fititnt mentioned this issue Apr 2, 2021

[meta] HXLm.lisp and/or related strategies for portable 'Turing complete' HDP custom functions #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`hxl-yml-spec-to-hxl-json-spec`: HXL Data processing specs exporter #14

`hxl-yml-spec-to-hxl-json-spec`: HXL Data processing specs exporter #14

fititnt commented Mar 12, 2021

fititnt commented Mar 13, 2021

fititnt commented Mar 13, 2021

fititnt commented Mar 14, 2021

fititnt commented Mar 14, 2021

fititnt commented Mar 14, 2021

hxl-yml-spec-to-hxl-json-spec: HXL Data processing specs exporter #14

hxl-yml-spec-to-hxl-json-spec: HXL Data processing specs exporter #14

Comments

fititnt commented Mar 12, 2021

fititnt commented Mar 13, 2021

fititnt commented Mar 13, 2021

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | grep "^[^#;]"

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | hxlspec

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ pip3 show libhxl | grep Version

fititnt commented Mar 14, 2021

fititnt commented Mar 14, 2021

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

fititnt commented Mar 14, 2021

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats/tests/hrecipe$ cat hello-world.hrecipe.hdp.yml

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats/tests/hrecipe$ hdpcli --export-to-hxl-json-processing-specs hello-world.hrecipe.hdp.yml

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml

`hxl-yml-spec-to-hxl-json-spec`: HXL Data processing specs exporter #14

`hxl-yml-spec-to-hxl-json-spec`: HXL Data processing specs exporter #14