Skip to content

Commit

Permalink
rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): convention of s500-s…
Browse files Browse the repository at this point in the history
…506 to use for adm0~adm6
  • Loading branch information
fititnt committed Jun 10, 2022
1 parent 7a5ea50 commit ea8226a
Show file tree
Hide file tree
Showing 4 changed files with 107 additions and 8 deletions.
29 changes: 28 additions & 1 deletion officina/999999999/0/L999999999_0.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,9 +198,25 @@
# P1448: short name of a place, organisation, person, journal,
# Wikidata property, etc.
'wdata': 'P1813' # short name
}
},
# [+code / +v_pcode] varies by context
# '+code': {}
'+v_m49': {
'wdata': 'P2082' # United Nations M.49 code for the subject item
},
'+v_iso3166p1a2': {
'wdata': 'P297' # ISO 3166-1 alpha-2 code
},
'+v_iso3166p1a3': {
'wdata': 'P298' # ISO 3166-1 alpha-3 code
},
# Consider using UN m49 code instead of the ISO one
'+v_iso3166p1n': {
'wdata': 'P299' # ISO 3166-1 numeric code
},
'+v_iso3166p2': {
'wdata': 'P300' # subdivision code ISO 3166-2
},
},
# Note: avoid use generic
'zzzgeneric': {
Expand All @@ -218,22 +234,33 @@
'#country': {
'wdata': 'Q6256' # country
},
# Not a valid HXL hashtag, but using anyway as alias to country
'#adm0': {
'hxlattrs': HXL_ATTRIBUTES_AD_WIKIDATA['geo'],
'wdata': 'Q6256' # country
},
'#adm1': {
'hxlattrs': HXL_ATTRIBUTES_AD_WIKIDATA['geo'],
'wdata': 'Q10864048' # first-level administrative country subdivisio
},
'#adm2': {
'hxlattrs': HXL_ATTRIBUTES_AD_WIKIDATA['geo'],
'wdata': 'Q13220204' # second-level administrative country subdivision
},
'#adm3': {
'hxlattrs': HXL_ATTRIBUTES_AD_WIKIDATA['geo'],
'wdata': 'Q13221722' # third-level administrative country subdivision
},
'#adm4': {
'hxlattrs': HXL_ATTRIBUTES_AD_WIKIDATA['geo'],
'wdata': 'Q14757767' # fourth-level administrative country subdivision
},
'#adm5': {
'hxlattrs': HXL_ATTRIBUTES_AD_WIKIDATA['geo'],
'wdata': 'Q15640612' # fifth-level administrative country subdivision
},
'#adm6': {
'hxlattrs': HXL_ATTRIBUTES_AD_WIKIDATA['geo'],
'wdata': 'Q22927291' # sixth-level administrative country subdivision
},
}
Expand Down
67 changes: 67 additions & 0 deletions officina/999999999/1568346/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@
- https://en.wikipedia.org/wiki/Resource_Description_Framework#Vocabulary
- https://github.com/hxl-team/HXL-Vocab/blob/master/Tools/hxl.ttl

- _Final_, well formated versions on HXL hashtags and BCP47 version are equivalent.
- Is possible to convert from both versions column by column,
**without need to re-calculate context**
- `BCP47_EX_HXL` on L999999999_0.py is just a syntatic sugar for
`#item+conceptum+codicem` and `#item+conceptum+numerordinatio`

## Note on harecoded special cases for cell value expansion: HXL `+rdf_y_*` and BCP47 `-r-y*`

### Explode list of items
Expand Down Expand Up @@ -36,6 +42,67 @@ Full Example:
- 999999999/1568346/data/unesco-thesaurus.tm.hxl.tsv
- https://vocabularies.unesco.org/exports/thesaurus/latest/unesco-thesaurus.ttl

## Note on numeration of abstract groups (used to link columns as concept bags)
Since both Numerordinatio on HXL and BCP47 are equivalent
(without need to re-calculate context of relationships) when merging different
datasets together, for mere sake of convenience (not enforced by tools)
for datasets that already are highly reusable, as mere suggestion:

- Keep in mind that is possible to mark a column as making part of more than
one "subject group".
- **The decision of such numbers does not matter for exported output**
- This is very, very useful for merging RDF triples despite tabular format
using HXL+RDF / HXL+BCP47 actually having very complext content
(id est, content that you can "explode" on several graph groups)
- For sake of make simpler for end user, let's assume the "subject group"
number zero "1" is focused content. So if user trying to merging
logical groups would start to use "2", then "3", then "4", ...
- In practice, the only thing tooling do is if you do not provide a concept
group, it will assume you want "1". But the tooling will work with
any number bigger than 0

### Suggested "subject group" for country, and administrative boundaries 1 to 6+

- Country:
- Number: `500`
- BCP47 RDF extension part (self): `r-sU2200-s500-snop`
- HXL RDF attribute (self): `+rdf_s_u2200_s500`
- #adm1:
- Number: `501`
- BCP47 RDF extension part (self): `r-sU2200-s501-snop`
- HXL RDF attribute (self): `+rdf_s_u2200_s501`
- #adm2:
- Number: `502`
- BCP47 RDF extension part (self): `r-sU2200-s502-snop`
- HXL RDF attribute (self): `+rdf_s_u2200_s502`
- #adm3:
- Number: `503`
- BCP47 RDF extension part (self): `r-sU2200-s503-snop`
- HXL RDF attribute (self): `+rdf_s_u2200_s503`
- #adm4:
- Number: `504`
- BCP47 RDF extension part (self): `r-sU2200-s504-snop`
- HXL RDF attribute (self): `+rdf_s_u2200_s504`
- #adm5:
- Number: `505`
- BCP47 RDF extension part (self): `r-sU2200-s505-snop`
- HXL RDF attribute (self): `+rdf_s_u2200_s505`
- #adm6:
- Number: `506`
- BCP47 RDF extension part (self): `r-sU2200-s506-snop`
- HXL RDF attribute (self): `+rdf_s_u2200_s506`


> Note: since is possible to mark columns with more than one subject group,
> by this convention if the content you want already is not a final
> dataset, for a dataset that is for example about #adm3, you could:
>
> - #adm3:
> - Number: `503` and `1`
> - BCP47 RDF extension part (self): `r-sU2200-s1-snop-sU2200-s503-snop`
> - HXL RDF attribute (self): `+rdf_s_u2200_s1+rdf_s_u2200_s503`

## To Dos
- https://www.w3.org/wiki/UsingSeeAlso
- maybe `-r-bVERB-bitem-bnop` ? (this would make result hardcoded to `rdfs:seeAlso`)
Expand Down
17 changes: 11 additions & 6 deletions officina/999999999/1568346/bcp47-to-hxl-to-rdf.sh
Original file line number Diff line number Diff line change
Expand Up @@ -272,40 +272,45 @@ test_cod_ab__with_inferences_prebuild() {

# officina/999999/1568346/data

# @TODO: implement implicit aliases when sU2200 reference multiple subject
# groups (like s500-s506 for administrative regions) but user
# askis for s1 and s1 is also one of these s500-s506.

set -x
"${ROOTDIR}/999999999/0/999999999_54872.py" \
--objectivum-formato=_temp_bcp47_meta_in_json \
--rdf-namespaces-archivo="${archivum__namespace}" \
--rdf-bag=500 \
"${archivum__cod_ab_bcp47}" |
jq >"${archivum__resultata_meta_json}"

"${ROOTDIR}/999999999/0/999999999_54872.py" \
--objectivum-formato=_temp_bcp47 \
--rdf-bag=1 \
--rdf-bag=500 \
--rdf-namespaces-archivo="${archivum__namespace}" \
"${archivum__cod_ab_bcp47}" |
rapper --quiet --input=turtle --output=turtle /dev/fd/0 \
>"${archivum__resultata_bag1}"

"${ROOTDIR}/999999999/0/999999999_54872.py" \
--objectivum-formato=_temp_bcp47 \
--rdf-bag=2 \
--rdf-bag=501 \
--rdf-namespaces-archivo="${archivum__namespace}" \
"${archivum__cod_ab_bcp47}" |
rapper --quiet --input=turtle --output=turtle /dev/fd/0 \
>"${archivum__resultata_bag2}"

"${ROOTDIR}/999999999/0/999999999_54872.py" \
--objectivum-formato=_temp_bcp47 \
--rdf-bag=3 \
--rdf-bag=502 \
--rdf-namespaces-archivo="${archivum__namespace}" \
"${archivum__cod_ab_bcp47}" |
rapper --quiet --input=turtle --output=turtle /dev/fd/0 \
>"${archivum__resultata_bag3}"

"${ROOTDIR}/999999999/0/999999999_54872.py" \
--objectivum-formato=_temp_bcp47 \
--rdf-bag=4 \
--rdf-bag=503 \
--rdf-namespaces-archivo="${archivum__namespace}" \
"${archivum__cod_ab_bcp47}" |
rapper --quiet --input=turtle --output=turtle /dev/fd/0 \
Expand Down Expand Up @@ -530,8 +535,8 @@ bcp47_and_hxlrdf_roundtrip__drill() {

# test_unesco_thesaurus
# test_cod_ab
# test_cod_ab__with_inferences_prebuild
# exit 0
test_cod_ab__with_inferences_prebuild
exit 0

echo "bcp47_to_hxl_to_rdf__tests"
bcp47_to_hxl_to_rdf__tests
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s1-snop-pOBO-pbfo124-ps2 qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s2-snop-pOBO-pbfo124-ps3-pOBO-pbfo171-ps1 qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s3-snop-pOBO-pbfo124-ps3-pOBO-pbfo171-ps2 qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s4-snop-pOBO-pbfo124-ps4-pOBO-pbfo171-ps3 por-Latn-r-pSKOS-pprefLabel-ps1 por-Latn-r-pSKOS-pprefLabel-ps2 por-Latn-r-pSKOS-pprefLabel-ps3 por-Latn-r-pSKOS-pprefLabel-ps4
qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s500-snop-sU2200-s500-snop-pOBO-pbfo124-ps501 qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s501-snop-pOBO-pbfo124-ps502-pOBO-pbfo171-ps500 qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s502-snop-pOBO-pbfo124-ps502-pOBO-pbfo171-ps501 qcc-Zxxx-r-aOBO-abfo29-anop-sU2200-s503-snop-pOBO-pbfo124-ps503-pOBO-pbfo171-ps502 por-Latn-r-pSKOS-pprefLabel-ps500 por-Latn-r-pSKOS-pprefLabel-ps501 por-Latn-r-pSKOS-pprefLabel-ps502 por-Latn-r-pSKOS-pprefLabel-ps503
1603:45:16:76:0 1603:45:16:76:1:31 1603:45:16:76:2:3106200 Brasil Minas Gerais Belo Horizonte
1603:45:16:24:0 1603:45:16:24:1:7 1603:45:16:24:2:7060 1603:45:16:24:3:7060201 Angola Cuanza Sul Sumbe (Ngangula) Kikombo

0 comments on commit ea8226a

Please sign in to comment.