Skip to content

Commit

Permalink
Update to new UCUM version v2.2 (June-2024)
Browse files Browse the repository at this point in the history
  • Loading branch information
dalito committed Aug 24, 2024
1 parent 3dd0b73 commit b845a76
Show file tree
Hide file tree
Showing 11 changed files with 65 additions and 48 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Note that UCUM does non provide a canonical representation, e.g. `m/s` and `m.s-

- Parser for UCUM unit strings that implements the full grammar.
- Converter for creating [pint](https://pypi.org/project/pint/) units from UCUM unit strings.
- A pint unit definition file [pint_ucum_defs.txt](https://github.com/dalito/ucumvert/blob/main/src/ucumvert/pint_ucum_defs.txt) that extends pint´s default units with UCUM units. All UCUM units from Version 2.1 of the specification are included.
- A pint unit definition file [pint_ucum_defs.txt](https://github.com/dalito/ucumvert/blob/main/src/ucumvert/pint_ucum_defs.txt) that extends pint´s default units with UCUM units. All UCUM units from the new version 2.2 of the specification (June 2024) are included.

**ucumvert** generates the UCUM grammar by filling a template with unit codes, prefixes etc. from the official [ucum-essence.xml](https://github.com/ucum-org/ucum/blob/main/ucum-essence.xml) file (a copy is included in this repo).
So updating the parser for new UCUM releases is straight forward.
Expand Down Expand Up @@ -126,7 +126,7 @@ To (re)generate this tsv-file from the official xlsx-file in the [UCUM repositor

```bash
pip install openpyxl
python src/src/ucumvert/vendor/get_ucum_example_as_tsv.py
python src/ucumvert/vendor/get_ucum_example_as_tsv.py
```

## Useful links
Expand Down
2 changes: 1 addition & 1 deletion src/ucumvert/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
# instead of deca-r which does not exist.

UCUM_GRAMMAR = """
# Based on UCUM specification (Version 2.1, 2017-11-21)
# Based on UCUM specification (Version 2.2, 2024-06-28)
# Includes ucumvert-specific fixes to handle all common UCUM units
# and some edge cases not present in the official examples.
# This file is auto-created by parser.update_lark_ucum_grammar_file
Expand Down
5 changes: 4 additions & 1 deletion src/ucumvert/pint_ucum_defs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ homeopathic_potency_of_quintamillesimal_korsakovian_series = 1 = _ = kp_Q
high_power_field = 1 = _ = HPF
low_power_field = 1 = _ = LPF
international_unit = 1 = _ = i.U. = IU = iU
arbitary_unit = 1 = _ = arb_U
arbitrary_unit = 1 = _ = arb_U
US_pharmacopeia_unit = 1 = _ = USP_U
GPL_unit = 1 = _ = GPL_U
MPL_unit = 1 = _ = MPL_U
Expand Down Expand Up @@ -88,6 +88,9 @@ diopter = 1 / meter = _ = diop
slope = tan(1 rad)
prism_diopter = 100 * tan(1 rad) = _ = p_diop

nephelometric_turbidity_unit = 1 = _ = NTU
formazin_nephelometric_unit = 1 = _ = FNU

mil_i = inch / 1000
cml_i = π/4 * mil_i**2
hd_i = 4 * inch
Expand Down
30 changes: 16 additions & 14 deletions src/ucumvert/pint_ucum_defs_mapping_report.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
# Ti --> tebi (default registry)

# === metric ===
# mol --> mole (default registry) # mol = 6.0221367 * 10*23 # METRIC, mole, amount of substance (si)
# mol --> mole (default registry) # mol = 6.02214076 * 10*23 # METRIC, mole, amount of substance (si)
# sr --> steradian (default registry) # sr = 1 * rad2 # METRIC, steradian, solid angle (si)
# Hz --> hertz (default registry) # Hz = 1 * s-1 # METRIC, hertz, frequency (si)
# N --> newton (default registry) # N = 1 * kg.m/s2 # METRIC, newton, force (si)
Expand All @@ -53,18 +53,18 @@
# ar --> are (ucumvert registry) # ar = 100 * m2 # METRIC, are, area (iso1000)
# t --> metric_ton (default registry) # t = 1e3 * kg # METRIC, tonne, mass (iso1000)
# bar --> bar (default registry) # bar = 1e5 * Pa # METRIC, bar, pressure (iso1000)
# u --> unified_atomic_mass_unit (default registry) # u = 1.6605402e-24 * g # METRIC, unified atomic mass unit, mass (iso1000)
# u --> unified_atomic_mass_unit (default registry) # u = 1.66053906660e-24 * g # METRIC, unified atomic mass unit, mass (iso1000)
# eV --> electron_volt (default registry) # eV = 1 * [e].V # METRIC, electronvolt, energy (iso1000)
# pc --> parsec (default registry) # pc = 3.085678e16 * m # METRIC, parsec, length (iso1000)
# [c] --> speed_of_light (default registry) # [c] = 299792458 * m/s # METRIC, velocity of light, velocity (const)
# [h] --> planck_constant (default registry) # [h] = 6.6260755e-34 * J.s # METRIC, Planck constant, action (const)
# [k] --> boltzmann_constant (default registry) # [k] = 1.380658e-23 * J/K # METRIC, Boltzmann constant, (unclassified) (const)
# [h] --> planck_constant (default registry) # [h] = 6.62607015e-34 * J.s # METRIC, Planck constant, action (const)
# [k] --> boltzmann_constant (default registry) # [k] = 1.380649e-23 * J/K # METRIC, Boltzmann constant, (unclassified) (const)
# [eps_0] --> vacuum_permittivity (default registry) # [eps_0] = 8.854187817e-12 * F/m # METRIC, permittivity of vacuum, electric permittivity (const)
# [mu_0] --> vacuum_permeability (default registry) # [mu_0] = 1 * 4.[pi].10*-7.N/A2 # METRIC, permeability of vacuum, magnetic permeability (const)
# [e] --> elementary_charge (default registry) # [e] = 1.60217733e-19 * C # METRIC, elementary charge, electric charge (const)
# [m_e] --> electron_mass (default registry) # [m_e] = 9.1093897e-28 * g # METRIC, electron mass, mass (const)
# [m_p] --> proton_mass (default registry) # [m_p] = 1.6726231e-24 * g # METRIC, proton mass, mass (const)
# [G] --> newtonian_constant_of_gravitation (default registry) # [G] = 6.67259e-11 * m3.kg-1.s-2 # METRIC, Newtonian constant of gravitation, (unclassified) (const)
# [e] --> elementary_charge (default registry) # [e] = 1.602176634e-19 * C # METRIC, elementary charge, electric charge (const)
# [m_e] --> electron_mass (default registry) # [m_e] = 9.1093837139e-31 * kg # METRIC, electron mass, mass (const)
# [m_p] --> proton_mass (default registry) # [m_p] = 1.67262192595e-27 * kg # METRIC, proton mass, mass (const)
# [G] --> newtonian_constant_of_gravitation (default registry) # [G] = 6.67430e-11 * m3.kg-1.s-2 # METRIC, Newtonian constant of gravitation, (unclassified) (const)
# [g] --> standard_gravity (default registry) # [g] = 980665e-5 * m/s2 # METRIC, standard acceleration of free fall, acceleration (const)
# [ly] --> light_year (default registry) # [ly] = 1 * [c].a_j # METRIC, light-year, length (const)
# gf --> force_gram (default registry) # gf = 1 * g.[g] # METRIC, gram-force, force (const)
Expand Down Expand Up @@ -290,7 +290,7 @@
# [S] --> svedberg (default registry) # [S] = 1 * 10*-13.s # NON_METRIC, Svedberg unit, sedimentation coefficient (chemical)
# [HPF] --> high_power_field (ucumvert registry) # [HPF] = 1 * 1 # NON_METRIC, high power field, view area in microscope (chemical)
# [LPF] --> low_power_field (ucumvert registry) # [LPF] = 100 * 1 # NON_METRIC, low power field, view area in microscope (chemical)
# [arb'U] --> arbitary_unit (ucumvert registry) # [arb'U] = 1 * 1 # NON_METRIC, arbitary unit, arbitrary (chemical)
# [arb'U] --> arbitrary_unit (ucumvert registry) # [arb'U] = 1 * 1 # NON_METRIC, arbitrary unit, arbitrary (chemical)
# [USP'U] --> US_pharmacopeia_unit (ucumvert registry) # [USP'U] = 1 * 1 # NON_METRIC, United States Pharmacopeia unit, arbitrary (chemical)
# [GPL'U] --> GPL_unit (ucumvert registry) # [GPL'U] = 1 * 1 # NON_METRIC, GPL unit, biologic activity of anticardiolipin IgG (chemical)
# [MPL'U] --> MPL_unit (ucumvert registry) # [MPL'U] = 1 * 1 # NON_METRIC, MPL unit, biologic activity of anticardiolipin IgM (chemical)
Expand All @@ -311,10 +311,10 @@
# [PFU] --> plaque_forming_unit (ucumvert registry) # [PFU] = 1 * 1 # NON_METRIC, plaque forming units, amount of an infectious agent (chemical)
# [FFU] --> focus_forming_units (ucumvert registry) # [FFU] = 1 * 1 # NON_METRIC, focus forming units, amount of an infectious agent (chemical)
# [CFU] --> colony_forming_unit (ucumvert registry) # [CFU] = 1 * 1 # NON_METRIC, colony forming units, amount of a proliferating organism (chemical)
# [IR] --> allergene_index_of_reactivity (ucumvert registry) # [IR] = 1 * 1 # NON_METRIC, index of reactivity, amount of an allergen callibrated through in-vivo testing using the Stallergenes® method. (chemical)
# [BAU] --> bioequivalent_allergen_unit (ucumvert registry) # [BAU] = 1 * 1 # NON_METRIC, bioequivalent allergen unit, amount of an allergen callibrated through in-vivo testing based on the ID50EAL method of (intradermal dilution for 50mm sum of erythema diameters (chemical)
# [IR] --> allergene_index_of_reactivity (ucumvert registry) # [IR] = 1 * 1 # NON_METRIC, index of reactivity, amount of an allergen calibrated through in-vivo testing using the Stallergenes® method (chemical)
# [BAU] --> bioequivalent_allergen_unit (ucumvert registry) # [BAU] = 1 * 1 # NON_METRIC, bioequivalent allergen unit, amount of an allergen calibrated through in-vivo testing based on the ID50EAL method of (intradermal dilution for 50mm sum of erythema diameters (chemical)
# [AU] --> allergen_unit (ucumvert registry) # [AU] = 1 * 1 # NON_METRIC, allergen unit, procedure defined amount of an allergen using some reference standard (chemical)
# [Amb'a'1'U] --> allergen_unit_for_Ambrosia_artemisiifolia (ucumvert registry) # [Amb'a'1'U] = 1 * 1 # NON_METRIC, allergen unit for Ambrosia artemisiifolia, procedure defined amount of the major allergen of ragweed. (chemical)
# [Amb'a'1'U] --> allergen_unit_for_Ambrosia_artemisiifolia (ucumvert registry) # [Amb'a'1'U] = 1 * 1 # NON_METRIC, allergen unit for Ambrosia artemisiifolia, procedure defined amount of the major allergen of ragweed (chemical)
# [PNU] --> protein_nitrogen_unit (ucumvert registry) # [PNU] = 1 * 1 # NON_METRIC, protein nitrogen unit, procedure defined amount of a protein substance (chemical)
# [Lf] --> limit_of_flocculation (ucumvert registry) # [Lf] = 1 * 1 # NON_METRIC, Limit of flocculation, procedure defined amount of an antigen substance (chemical)
# [D'ag'U] --> D_antigen_unit (ucumvert registry) # [D'ag'U] = 1 * 1 # NON_METRIC, D-antigen unit, procedure defined amount of a poliomyelitis d-antigen substance (chemical)
Expand All @@ -324,11 +324,13 @@
# Ao --> angstrom (ucumvert registry) # Ao = 0.1 * nm # NON_METRIC, Ångström, length (misc)
# b --> barn (default registry) # b = 100 * fm2 # NON_METRIC, barn, action area (misc)
# att --> technical_atmosphere (ucumvert registry) # att = 1 * kgf/cm2 # NON_METRIC, technical atmosphere, pressure (misc)
# [psi] --> pound_force_per_square_inch (default registry) # [psi] = 1 * [lbf_av]/[in_i]2 # NON_METRIC, pound per sqare inch, pressure (misc)
# [psi] --> pound_force_per_square_inch (default registry) # [psi] = 1 * [lbf_av]/[in_i]2 # NON_METRIC, pound per square inch, pressure (misc)
# circ --> turn (ucumvert registry) # circ = 2 * [pi].rad # NON_METRIC, circle, plane angle (misc)
# sph --> sphere (ucumvert registry) # sph = 4 * [pi].sr # NON_METRIC, spere, solid angle (misc)
# sph --> sphere (ucumvert registry) # sph = 4 * [pi].sr # NON_METRIC, sphere, solid angle (misc)
# [car_m] --> carat (ucumvert registry) # [car_m] = 2e-1 * g # NON_METRIC, metric carat, mass (misc)
# [car_Au] --> carat_of_gold_alloys (ucumvert registry) # [car_Au] = 1/24 # NON_METRIC, carat of gold alloys, mass fraction (misc)
# [smoot] --> smoot (ucumvert registry) # [smoot] = 67 * [in_i] # NON_METRIC, Smoot, length (misc)
# [m/s2/Hz^(1/2)] --> meter_per_square_second_per_square_root_of_hertz (ucumvert registry) # [m/s2/Hz^(1/2)] = 1 * sqrt(1 m2/s4/Hz) # NON_METRIC, meter per square seconds per square root of hertz, amplitude spectral density (misc)
# [NTU] --> nephelometric_turbidity_unit (ucumvert registry) # [NTU] = 1 * 1 # NON_METRIC, Nephelometric Turbidity Unit, turbidity (misc)
# [FNU] --> formazin_nephelometric_unit (ucumvert registry) # [FNU] = 1 * 1 # NON_METRIC, Formazin Nephelometric Unit, turbidity (misc)
# bit_s --> bit (ucumvert registry) # bit_s = 1 * ld(1 1) # NON_METRIC, bit, amount of information (infotech)
5 changes: 3 additions & 2 deletions src/ucumvert/ucum_grammar.lark
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Based on UCUM specification (Version 2.1, 2017-11-21)
# Based on UCUM specification (Version 2.2, 2024-06-28)
# Includes ucumvert-specific fixes to handle all common UCUM units
# and some edge cases not present in the official examples.
# This file is auto-created by parser.update_lark_ucum_grammar_file
Expand Down Expand Up @@ -74,7 +74,8 @@ UNIT_NON_METRIC: "10*" |"10^" |"[pi]" |"%" |"[ppth]" |"[ppm]" |"[ppb]"
|"[CCID_50]" |"[TCID_50]" |"[EID_50]" |"[PFU]" |"[FFU]" |"[CFU]"
|"[IR]" |"[BAU]" |"[AU]" |"[Amb'a'1'U]" |"[PNU]" |"[Lf]" |"[D'ag'U]"
|"[FEU]" |"[ELU]" |"[EU]" |"Ao" |"b" |"att" |"[psi]" |"circ" |"sph"
|"[car_m]" |"[car_Au]" |"[smoot]" |"[m/s2/Hz^(1/2)]" |"bit_s"
|"[car_m]" |"[car_Au]" |"[smoot]" |"[m/s2/Hz^(1/2)]" |"[NTU]"
|"[FNU]" |"bit_s"

EXPONENT : ["+"|"-"] NON_ZERO_DIGITS
FACTOR: NON_ZERO_DIGITS
Expand Down
5 changes: 3 additions & 2 deletions src/ucumvert/ucum_pint.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,5 +385,6 @@ def run_examples(): # pragma: no cover


if __name__ == "__main__":
run_examples()
# find_ucum_codes_that_need_mapping()
# run_examples()
find_matching_pint_definitions()
find_ucum_codes_that_need_mapping()
2 changes: 1 addition & 1 deletion src/ucumvert/vendor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This directory contain copies of files from the [UCUM repository](https://github.com/ucum-org/ucum) to enable running the code without internet access. The copied files fall under the [UCUM Copyright Notice and License](https://github.com/ucum-org/ucum/blob/main/LICENSE.md) (Version 1.0).

* `ucum-essence.xml` - Version 2.1 (revision date: 2017-11-21 19:04:52 -0500).
* `ucum-essence.xml` - Version 2.2 (revision date: 2024-06-17).
* Used to build the terminals of the lark parser.
* `ucum_examples.tsv` - Extracted from [TableOfExampleUcumCodesForElectronicMessaging.xlsx](https://github.com/ucum-org/ucum/blob/main/common-units/TableOfExampleUcumCodesForElectronicMessaging.xlsx), Version 1.5, released 06/2020
* Used in unit tests. The tsv was created with the script `get_ucum_examples_as_tsv.py`.
Expand Down
Loading

0 comments on commit b845a76

Please sign in to comment.