-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urnresolver
: Uniform Resource Names - URN Resolver
#13
Comments
…tead of script (more strict control, force to be python script, not generic system scrypt)
Because of this topic, we will need to create some sort of local vault for permanent storage. One idea about the namespace I think that in fact, instead of "return error" if the user does not force return error, but allow the urnresolver return ANOTHER urn (like Note: I know that |
…te a formal ABNF (like the ISO URN, RFC5141); but ANTLR seems more friendly
… buld full parser to avoid regex hell; I think maybe the early versions could be a bunch of effective if-elses
…r before need to resort to full grammar checking; this at least could help locallized namespaces get what matter for then, while keeping the start of the 'urn:data' predictable
…domain names as quick namespace; full unicode support need more testing (fallbacking to GenericUrnHtype instead of DataUrnHtype)
This is the current result. As for baseline URN processing strategy (likely to be the "organization" inside an already namespaced country/territory) could be both an single identifier or (since I'm not sure if most people in the middle of urgency would agree with something) then use an domain name itself. fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ ./tests/test_core_urn.py
DataUrnHtype(value='urn:data--i:un:locode') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'un', 'bpln': 'locode', 'nss': 'un:locode'}
DataUrnHtype(value='URN:DATA--I:UN:LOCODE') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'URN', 'bpln': 'DATA--I', 'nss': 'URN:DATA--I:UN:LOCODE'}
DataUrnHtype(value='urn:data:un:locode') {'nid': 'data', 'nid_attr': 'd', 'bpgp': 'un', 'bpln': 'locode', 'nss': 'un:locode'}
DataUrnHtype(value='urn:data--i:xz:hxlcplp:fod:bool') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'xz', 'bpln': 'hxlcplp', 'nss': 'xz:hxlcplp:fod:bool'}
DataUrnHtype(value='urn:data:br:__saude.gov.br__:covid-19-vacinacao') {'nid': 'data', 'nid_attr': 'd', 'bpgp': 'br', 'bpln': 'saude.gov.br', 'bpln_isdn': True, 'nss': 'br:__saude.gov.br__:covid-19-vacinacao'}
DataUrnHtype(value='urn:data--i:cn:__中国.icom.museum__:test') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'cn', 'bpln': '中国.icom.museum', 'bpln_isdn': True, 'nss': 'cn:__中国.icom.museum__:test'}
DataUrnHtype(value='urn:data--i:ru:__россия.иком.museum__:test') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'ru', 'bpln': 'россия.иком.museum', 'bpln_isdn': True, 'nss': 'ru:__россия.иком.museum__:test'}
DataUrnHtype(value='urn:data--i:eg:__مصر.icom.museum__:test') {'nid': 'data', 'nid_attr': 'i', 'bpgp': 'eg', 'bpln': 'مصر.icom.museum', 'bpln_isdn': True, 'nss': 'eg:__مصر.icom.museum__:test'} The idea is the urnresolver be able to (if do exist one already prepared dataset on a path available on local filesystem) based on most comon URNs even if implementers do not create something very specific for the country or the organuzation, at least the default strategy would allow people working with datasets some place to put the files. If the default is good enough, while documentations could always require the humans manually translate, at least the default resolver could make documentations directly usable! |
…aming files to be just urn.csv,urn.json,urn.yml (this is more an suffix, since users could in fact search for entire paths)
…raft of get_urn_resolver_remote() & get_urn_resolver_remote_authenticated()
…isCI, removed hardcoded path; renamed urn:data:xz:hxl:std:core:hashtag (std inspired on ISO RFC) to urn:data:xz:hxl:standard:core:hashtag (maybe core is not need?)
… like when have several sources or URNs, allow urnresolver filter sources (at first just use file names)
… did not customized yet), file urnresolver-default.urn.yml
The current version of HXL-Data-Science-file-formats is v0.7.3. I think that the tools for URN resolving worth an different group from HXL2 topic. In fact, the URN resolving often would be applied to content that still not HXLated yet or deal with issues also related to HXL, like very sensitive content (like how to name URLs that may be protected just by randomness, not by access control?) |
…ologia-anatomica initial version based on https://github.com/EticaAI/EticaAI-linguistic-datasets-pt/blob/main/semi-automated-guides/sparql-wikidata.md
…cem:sexum:binarium & urn:data:xz:eticaai:ontologia:codicem:sexum:non-binarium
…:xz:hxl:standard:core:attribute, urn:data:xz:hxl:standard:master-vocabulary, urn:data:xz:hxl:cplp:hxl2tab
I believe we should add a few more parameters on the One change is the The urn.yml formatOld formatExample 1 # Trivia:
# - "fontem"
# - https://en.wiktionary.org/wiki/fons#Latin
# - "auxilium"
# - https://en.wiktionary.org/wiki/auxilium#Latin
# - "dēscrīptiōnem"
# - https://en.wiktionary.org/wiki/descriptio#Latin
# - "explānandum"
# - https://en.wiktionary.org/wiki/explano#Latin
- urn: "urn:data:xz:hxl:standard:core:hashtag"
descriptionem:
eng-Latn: HXL/CSV version of the HXL Standard core hashtags.
auxilium:
- https://data.humdata.org/dataset/hxl-core-schemas
fontem:
- ontologia/codicem/hxl/standard/core/hashtag.hxl.csv
- https://proxy.hxlstandard.org/data.csv?dest=data_edit&strip-headers=on&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI%2Fpub%3Fgid%3D319251406%26single%3Dtrue%26output%3Dcsv
- https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/pub?gid=319251406&single=true&output=csv
Example 2 - urn: "urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica"
descriptionem:
eng-Latn: >
Table with code references for body parts, in special
Terminologia Anatomica (TA). Can be used with other ontologies and
to transform for a few natural languages descriptions.
explanandum:
# Good references:
- +v_fipat_ta2
- +v_fipat_ta98_id
- +v_fipat_ta98_latin
# Generic references:
- +v_wikidata
- +v_fi_yso
- +v_fr_universalis
- +v_it_bncf
- +v_jp_ndl
- +v_uberon
- +v_uk_britannica
- +v_us_jstor
- +v_us_mag
- +v_us_mesh
- +v_us_umls_cui
auxilium:
- https://github.com/HXL-CPLP/forum/issues/44
- https://www4.unifr.ch/ifaa/Public/EntryPage/TA98%20Tree/HelpPage/TA98%20Latin%20Page%20Help.pdf
exemplum:
# Since terminologia-anatomica.hxl.csv 1,8mb, we only deploy a sample
- ontologia/codicem/anatomiam/terminologia-anatomica-EXEMPLUM.hxl.csv
fontem:
# run ontologia/codicem/anatomiam/make.sh to get terminologia-anatomica.hxl.csv
# or let the urnresolver download from live URNs
- ontologia/codicem/anatomiam/terminologia-anatomica.hxl.csv
- https://proxy.hxlstandard.org/data/b02a5f/download/HXL_CPLP-FOD_medicinae-legalis_humana-corpus.csv
- https://docs.google.com/spreadsheets/d/10axnLpDNtAc8Bh921dz5XPXCwo0FUXRcKS6-ermiu5w/edit#gid=1622293684 Old format# URNResolver v1.2.1
# hdp-toolchain v0.8.7.2
# @see https://data.humdata.org/dataset/hxl-core-schemas
- urn: "urn:data:xz:hxl:standard:core:hashtag"
source:
- ontologia/codicem/hxl/standard/core/hashtag.hxl.csv
- https://proxy.hxlstandard.org/data.csv?dest=data_edit&strip-headers=on&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI%2Fpub%3Fgid%3D319251406%26single%3Dtrue%26output%3Dcsv
- https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/pub?gid=319251406&single=true&output=csv
|
…ions descriptionem, auxilium, explanandum, exemplum
Added reverse search, like I believe we will need to build some table that could give a hint that some codes, like # fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver -?? +v_iso15924
urn:data:xz:eticaai:ontologia:codicem:linguam
# fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver -?? country+code+v_iso2
urn:data:xz:eticaai:ontologia:codicem:locum
# fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver --urn-explanandum-list
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_fipat_ta2
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_fipat_ta98_id
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_fipat_ta98_latin
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_wikidata
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_fi_yso
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_fr_universalis
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_it_bncf
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_jp_ndl
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_uberon
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_uk_britannica
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_us_jstor
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_us_mag
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_us_mesh
urn:data:xz:eticaai:ontologia:codicem:anatomiam:terminologia-anatomica +v_us_umls_cui
urn:data:xz:eticaai:ontologia:codicem:sexum:binarium +v_iso5218
urn:data:xz:eticaai:ontologia:codicem:sexum:binarium +v_iso5218_extended
urn:data:xz:eticaai:ontologia:codicem:sexum:binarium +v_fipat_ta98_latin
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7 +v_iso5218
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7 +v_iso5218_extended
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7 +v_us_cdc_sex
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7 +v_un_icao_sex
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7 +v_us_NAACCR
urn:data:xz:eticaai:ontologia:codicem:sexum:hl7 +v_us_census_sex
urn:data:xz:eticaai:ontologia:codicem:sexum:non-binarium +lat_codices_anonyma
urn:data:xz:eticaai:ontologia:codicem:sexum:non-binarium +v_iso5218_extended
urn:data:xz:eticaai:ontologia:codicem:linguam +v_iso15924
urn:data:xz:eticaai:ontologia:codicem:locum country+code+v_iso2
urn:data:xz:eticaai:ontologia:codicem:locum country+code+v_iso3
urn:data:xz:eticaai:ontologia:codicem:locum +v_hrinfo_country
urn:data:xz:eticaai:ontologia:codicem:locum +v_reliefweb
urn:data:xz:eticaai:ontologia:codicem:locum country+code+v_reliefweb
# fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ urnresolver -? urn:data:xz:hxl:standard:core:attribute
[
{
"urn": "urn:data:xz:hxl:standard:core:attribute",
"descriptionem": {
"eng-Latn": "HXL/CSV version of the HXL Standard core attributes."
},
"auxilium": [
"https://data.humdata.org/dataset/hxl-core-schemas"
],
"fontem": [
"ontologia/codicem/hxl/standard/core/hashtag.hxl.csv",
"https://proxy.hxlstandard.org/data.csv?dest=data_view&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI%2Fpub%3Fgid%3D1810309357%26single%3Dtrue%26output%3Dcsv&strip-headers=on",
"https://docs.google.com/spreadsheets/d/1En9FlmM8PrbTWgl3UHPF_MXnJ6ziVZFhBbojSJzBdLI/pub?gid=1810309357&single=true&output=csv"
],
"urnref": "urnresolver-default.urn.yml"
}
]
|
👋 Thank you! |
Quick links
As part of reference the datasets (temporary internal name:
hdataset
) from different groups (temporary internal name:hsilo
) makes sense to have some way to padronize naming. And URNs, even if is complicated to implement in practice, at least could serve as hint for humans simply avoid using whatever is their creative idea at the moment. (This actually is more important if we're implementing localized translations as part of the [meta issue] hxlm #11 with equal equivalent between translations).The text was updated successfully, but these errors were encountered: