-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hxl-yml-spec-to-hxl-json-spec
: HXL Data processing specs exporter
#14
Comments
Maybe this
and an file like this - hsilo: "test1"
hrecipe:
- id: recipe1
source:
- iri: https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0
filters:
- filter: with_columns
with_columns: "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
- filter: without_rows
without_rows: "#vocab+code+v_6391=" could be a good starting point. But form my experience with Ansible (and very, very large Ansible playbooks) we could do from start allow parsing several YAML files at once and just output all the json specs line by line. But the come to this point, the hdpcli needs to implement some way to at least concatenate more than one YAML file. (the part about include_file options may be something for later). |
…_init, HDP._safer_zone_hosts, HDP._safer_zone_list, HDP.export_schema_json(), HDP.export_yml()
This is the yaml file (there is some extra markup, but ignore for now). fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | grep "^[^#;]"---
- hsilo:
name: "test1"
desc: from https://docs.google.com/presentation/d/17vXOnq2atIDnrODGLs36P1EaUvT-vXPjsc2I1q1Qc50/
hrecipe:
- id: example-processing-with-a-JSON-spec
iri_example:
- iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
sheet_index: 1
recipe:
- filter: count
patterns: "adm1+name,adm1+code"
aggregators:
- "sum(population) as Population#population"
- filter: clean_data
number: "population"
number_format: .0f This is the json spec result fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml{
"input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
"recipe": [
{
"aggregators": [
"sum(population) as Population#population"
],
"filter": "count",
"patterns": "adm1+name,adm1+code"
},
{
"filter": "clean_data",
"number": "population",
"number_format": ".0f"
}
],
"sheet_index": 1
} fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | hxlspec
And actually, redirecting to command line hxlspec actually worked. Just a quick warning, but worked! fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ pip3 show libhxl | grep VersionVersion: 4.22 |
We need some way to make 'inline' data tables that could work an some way to test if an HXL Data processing specs is working (and needs to work offline). This implies add some new attributes, in special the concept of inline data and expected result data. Or maybe the concept of 'example'. |
…fy more than one source; also content without translation will be prefixed with a single _
Current example fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml# yaml-language-server: $schema=https://raw.githubusercontent.com/EticaAI/HXL-Data-Science-file-formats/main/hxlm/core/schema/hdp.json-schema.json
# How to run this file? Version tested: v0.7.4
# @see https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/14#issuecomment-798454298
# To inspect the result (pretty print)
# hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml
# To pipe the result direct to hxlspec (first item of array, use jq '.[0]')
# hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[0]' | hxlspec
# To pipe the result direct to hxlspec (first item of array, use jq '.[1]')
# hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[1]' | hxlspec
---
# See also https://proxy.hxlstandard.org/api/from-spec.html
# http://json-schema.org/understanding-json-schema/
# Test schema online https://www.jsonschemavalidator.net/
# Validate schema here: https://www.json-schema-linter.com/
# TODO: better validate HERE https://jsonschemalint.com/#!/version/draft-07/markup/json
- hsilo: "test1"
hrecipe:
- id: recipe1
_recipe:
- filter: with_columns
includes: "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
- filter: without_rows
queries: "#vocab+code+v_6391="
exemplum:
- fontem:
iri: https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0
- hsilo:
nomen: "test1"
descriptionem: from https://docs.google.com/presentation/d/17vXOnq2atIDnrODGLs36P1EaUvT-vXPjsc2I1q1Qc50/
hrecipe:
- id: example-processing-with-a-JSON-spec
_recipe:
- filter: count
patterns: "adm1+name,adm1+code"
aggregators:
- "sum(population) as Population#population"
- filter: clean_data
number: "population"
number_format: .0f
exemplum:
- fontem:
iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
_sheet_index: 1 fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml[
{
"input": "https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0",
"recipe": [
{
"filter": "with_columns",
"includes": "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
},
{
"filter": "without_rows",
"queries": "#vocab+code+v_6391="
}
]
},
{
"input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
"recipe": [
{
"aggregators": [
"sum(population) as Population#population"
],
"filter": "count",
"patterns": "adm1+name,adm1+code"
},
{
"filter": "clean_data",
"number": "population",
"number_format": ".0f"
}
]
}
] |
…input_data', 'output_data', as part of hrecipe.exemplum; the underlining inplementation still not ready, but the idea is be able to specify self-contained example when creating recipes with YAML; the hrecipe.exemplum[N]objectivum.datum can be used for self-contained testing!
Now the The idea of use 'exemplum' is because if one goal of recipes would be reusability, this means that any input data there would be... just as example/reference. The inpact of this is that now, the first item when exporting will always be without example inputs, but the second one would be like before. Both 'input_data' / 'output_data' actually are one way to express, as inline data, input data (to not use external link) and 'output_data' is if eventually we implement some way to use an recipe to be able to be tested on different proxies. Also, the idea of 'input_data' / 'output_data', even if ignored by HXL data processing specs, can be used to just looking at one YAML file have an idea of what the recipe would do. (ok that the idea is actually test if really works, but at least for documentation it already serve!) fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats/tests/hrecipe$ cat hello-world.hrecipe.hdp.yml# cd tests/hrecipe
# hdpcli --export-to-hxl-json-processing-specs hello-world.hrecipe.hdp.yml
# hdpcli --export-to-hxl-json-processing-specs hello-world.hrecipe.hdp.yml | jq '.[1]' | hxlspec
---
- hsilo:
nomen: hello-world.hrecipe.hdp.yml
linguam: mul # https://iso639-3.sil.org/code/mul
hrecipe:
- id: example-processing-with-a-JSON-spec
_recipe:
- filter: count
patterns: "adm1+name,adm1+code"
aggregators:
- "sum(population) as Population#population"
- filter: clean_data
number: "population"
number_format: .0f
# iri_example:
# - iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
# sheet_index: 1
exemplum:
# Example one
- fontem:
iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
_sheet_index: 1
# Example two includes both an inline data
- fontem:
# Note: fontem.datum not fully implemented. But the idea here is
# be able to create an ad-hoc table instead of use
# external input. So help show as quick example or...
# as some sort of unitary test for an HXL data processing
# spec!
datum:
- ["header 1", "header 2", "header 3"]
- ["#item +id", "#item +name", "#item +value"]
- ["ACME1", "ACME Inc.", "123"]
- ["XPTO1", "XPTO org", "456"]
objectivum:
# Note: fontem.objectivum not fully implemented. But the idea here
# is (like the fontem.datum) work as ad-hoc table, but is
# really allow create some sort of unitary test for a HXL
# data processing spec!
datum:
- ["header 1", "header 2", "header 3"]
- ["#item +id", "#item +name", "#item +value"]
- ["ACME1", "ACME Inc.", "123"]
- ["XPTO1", "XPTO org", "456"] fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats/tests/hrecipe$ hdpcli --export-to-hxl-json-processing-specs hello-world.hrecipe.hdp.yml[
{
"recipe": [
{
"aggregators": [
"sum(population) as Population#population"
],
"filter": "count",
"patterns": "adm1+name,adm1+code"
},
{
"filter": "clean_data",
"number": "population",
"number_format": ".0f"
}
]
},
{
"input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
"recipe": [
{
"aggregators": [
"sum(population) as Population#population"
],
"filter": "count",
"patterns": "adm1+name,adm1+code"
},
{
"filter": "clean_data",
"number": "population",
"number_format": ".0f"
}
],
"sheet_index": 1
},
{
"input_data": [
[
"header 1",
"header 2",
"header 3"
],
[
"#item +id",
"#item +name",
"#item +value"
],
[
"ACME1",
"ACME Inc.",
"123"
],
[
"XPTO1",
"XPTO org",
"456"
]
],
"output_data": [
[
"header 1",
"header 2",
"header 3"
],
[
"#item +id",
"#item +name",
"#item +value"
],
[
"ACME1",
"ACME Inc.",
"123"
],
[
"XPTO1",
"XPTO org",
"456"
]
],
"recipe": [
{
"aggregators": [
"sum(population) as Population#population"
],
"filter": "count",
"patterns": "adm1+name,adm1+code"
},
{
"filter": "clean_data",
"number": "population",
"number_format": ".0f"
}
]
}
] fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ cat tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml# yaml-language-server: $schema=https://raw.githubusercontent.com/EticaAI/HXL-Data-Science-file-formats/main/hxlm/core/schema/hdp.json-schema.json
# How to run this file? Version tested: v0.7.4
# @see https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/14#issuecomment-798454298
# To inspect the result (pretty print)
# hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml
# To pipe the result direct to hxlspec (second item of array, use jq '.[1]')
# hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[1]' | hxlspec
# To pipe the result direct to hxlspec (4º item of array, use jq '.[1]')
# hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml | jq '.[3]' | hxlspec
---
# See also https://proxy.hxlstandard.org/api/from-spec.html
# http://json-schema.org/understanding-json-schema/
# Test schema online https://www.jsonschemavalidator.net/
# Validate schema here: https://www.json-schema-linter.com/
# TODO: better validate HERE https://jsonschemalint.com/#!/version/draft-07/markup/json
- hsilo: "test1"
hrecipe:
- id: recipe1
_recipe:
- filter: with_columns
includes: "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
- filter: without_rows
queries: "#vocab+code+v_6391="
exemplum:
- fontem:
iri: https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0
- hsilo:
nomen: "test1"
descriptionem: from https://docs.google.com/presentation/d/17vXOnq2atIDnrODGLs36P1EaUvT-vXPjsc2I1q1Qc50/
hrecipe:
- id: example-processing-with-a-JSON-spec
_recipe:
- filter: count
patterns: "adm1+name,adm1+code"
aggregators:
- "sum(population) as Population#population"
- filter: clean_data
number: "population"
number_format: .0f
exemplum:
- fontem:
iri: https://data.humdata.org/dataset/yemen-humanitarian-needs-overview
_sheet_index: 1 fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hdpcli --export-to-hxl-json-processing-specs tests/hxl-processing-specs/hxl-processing-specs-test-01.hdp.yml[
{
"recipe": [
{
"filter": "with_columns",
"includes": "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
},
{
"filter": "without_rows",
"queries": "#vocab+code+v_6391="
}
]
},
{
"input": "https://docs.google.com/spreadsheets/d/12k4BWqq5c3mV9ihQscPIwtuDa_QRB-iFohO7dXSSptI/edit#gid=0",
"recipe": [
{
"filter": "with_columns",
"includes": "#vocab+id+v_iso6393_3letter,#vocab+code+v_6391,#vocab+name"
},
{
"filter": "without_rows",
"queries": "#vocab+code+v_6391="
}
]
},
{
"recipe": [
{
"aggregators": [
"sum(population) as Population#population"
],
"filter": "count",
"patterns": "adm1+name,adm1+code"
},
{
"filter": "clean_data",
"number": "population",
"number_format": ".0f"
}
]
},
{
"input": "https://data.humdata.org/dataset/yemen-humanitarian-needs-overview",
"recipe": [
{
"aggregators": [
"sum(population) as Population#population"
],
"filter": "count",
"patterns": "adm1+name,adm1+code"
},
{
"filter": "clean_data",
"number": "population",
"number_format": ".0f"
}
],
"sheet_index": 1
}
] |
hxl-yml-spec-to-hxl-json-spec
hxl-yml-spec-to-hxl-json-spec
: HXL Data processing specs exporter
Quick links:
Let's do an proof of concept of the thing!
The text was updated successfully, but these errors were encountered: