Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicates from languages exemplar_chars #18

Merged
merged 3 commits into from
Oct 31, 2022

Conversation

moyogo
Copy link
Contributor

@moyogo moyogo commented Sep 12, 2022

There are duplicates in language exemplar_chars. This removes them.

The script run is snippets/fix-exemplars-duplicates.py and produced the following output:

Changed Lib/gflanguages/data/languages/agq_Latn.textproto base exemplar:

  • from 304 (64 as set) to 64 elements
  • removing 240 duplicate(s):
    [('{ɛ̀}', 15), ('{ɛ̂}', 15), ('{ɛ̌}', 15), ('{ɛ̄}', 15), ('{ɨ̀}', 15), ('{ɨ̂}', 15), ('{ɨ̌}', 15), ('{ɨ̄}', 15), ('{ɔ̀}', 15), ('{ɔ̂}', 15), ('{ɔ̌}', 15), ('{ɔ̄}', 15), ('{ʉ̀}', 15), ('{ʉ̂}', 15), ('{ʉ̌}', 15), ('{ʉ̄}', 15)]

Changed Lib/gflanguages/data/languages/as_Beng.textproto base exemplar:

  • from 123 (63 as set) to 63 elements
  • removing 60 duplicate(s):
    [('{ড়}', 15), ('{ঢ়}', 15), ('{য়}', 15), ('{ক্ষ}', 15)]

Changed Lib/gflanguages/data/languages/bas_Latn.textproto base exemplar:

  • from 432 (86 as set) to 86 elements
  • removing 346 duplicate(s):
    [('{ɛ́}', 15), ('{ɛ̀}', 15), ('{ɛ̂}', 15), ('{ɛ̌}', 15), ('{ɛ̄}', 15), ('{ɔ́}', 15), ('{ɔ̀}', 15), ('{ɔ̂}', 15), ('{ɔ̌}', 15), ('{ɔ̄}', 15), ('{a᷆}', 14), ('{a᷇}', 14), ('{e᷆}', 14), ('{e᷇}', 14), ('{ɛ᷆}', 14), ('{ɛ᷇}', 14), ('{i᷆}', 14), ('{i᷇}', 14), ('{o᷆}', 14), ('{o᷇}', 14), ('{ɔ᷆}', 14), ('{ɔ᷇}', 14), ('{u᷆}', 14), ('{u᷇}', 14)]

Changed Lib/gflanguages/data/languages/be_Cyrl.textproto punctuation exemplar:

  • from 46 (16 as set) to 16 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/bem_Latn.textproto index exemplar:

  • from 35 (20 as set) to 20 elements
  • removing 15 duplicate(s):
    [('{SH}', 15)]

Changed Lib/gflanguages/data/languages/bg_Cyrl.textproto auxiliary exemplar:

  • from 103 (13 as set) to 13 elements
  • removing 90 duplicate(s):
    [('{а̀}', 15), ('{о̀}', 15), ('{у̀}', 15), ('{ъ̀}', 15), ('{ю̀}', 15), ('{я̀}', 15)]

Changed Lib/gflanguages/data/languages/bn_Beng.textproto index exemplar:

  • from 59 (44 as set) to 44 elements
  • removing 15 duplicate(s):
    [('{ক্ষ}', 15)]

Changed Lib/gflanguages/data/languages/bo_Tibt.textproto base exemplar:

  • from 358 (103 as set) to 103 elements
  • removing 255 duplicate(s):
    [('{ཀྵ}', 15), ('{ྐྵ}', 15), ('{གྷ}', 15), ('{ྒྷ}', 15), ('{ཌྷ}', 15), ('{ྜྷ}', 15), ('{དྷ}', 15), ('{ྡྷ}', 15), ('{བྷ}', 15), ('{ྦྷ}', 15), ('{ཛྷ}', 15), ('{ྫྷ}', 15), ('{ཱི}', 15), ('{ཱྀ}', 15), ('{ཱུ}', 15), ('{ྲྀ}', 15), ('{ླྀ}', 15)]

Changed Lib/gflanguages/data/languages/br_Latn.textproto punctuation exemplar:

  • from 44 (14 as set) to 14 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/brx_Deva.textproto index exemplar:

  • from 61 (46 as set) to 46 elements
  • removing 15 duplicate(s):
    [('{ड़}', 15)]

Changed Lib/gflanguages/data/languages/bs_Latn.textproto index exemplar:

  • from 78 (33 as set) to 33 elements
  • removing 45 duplicate(s):
    [('{DŽ}', 15), ('{LJ}', 15), ('{NJ}', 15)]

Changed Lib/gflanguages/data/languages/ce_Cyrl.textproto punctuation exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/cs_Latn.textproto index exemplar:

  • from 46 (31 as set) to 31 elements
  • removing 15 duplicate(s):
    [('{CH}', 15)]

Changed Lib/gflanguages/data/languages/cy_Latn.textproto index exemplar:

  • from 154 (34 as set) to 34 elements
  • removing 120 duplicate(s):
    [('{CH}', 15), ('{DD}', 15), ('{FF}', 15), ('{NG}', 15), ('{LL}', 15), ('{PH}', 15), ('{RH}', 15), ('{TH}', 15)]

Changed Lib/gflanguages/data/languages/de_Latn.textproto punctuation exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/doi_Deva.textproto base exemplar:

  • from 117 (72 as set) to 72 elements
  • removing 45 duplicate(s):
    [('{क्ष}', 15), ('{ड़}', 15), ('{ढ़}', 15)]

Changed Lib/gflanguages/data/languages/dsb_Latn.textproto index exemplar:

  • from 49 (34 as set) to 34 elements
  • removing 15 duplicate(s):
    [('{Ch}', 15)]

Changed Lib/gflanguages/data/languages/dua_Latn.textproto base exemplar:

  • from 80 (35 as set) to 35 elements
  • removing 45 duplicate(s):
    [('{ɛ́}', 15), ('{ny}', 15), ('{ɔ́}', 15)]

Changed Lib/gflanguages/data/languages/ee_Latn.textproto punctuation exemplar:

  • from 64 (34 as set) to 34 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/eo_Latn.textproto punctuation exemplar:

  • from 55 (25 as set) to 25 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/et_Latn.textproto punctuation exemplar:

  • from 48 (18 as set) to 18 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/ewo_Latn.textproto base exemplar:

  • from 320 (65 as set) to 65 elements
  • removing 255 duplicate(s):
    [('{dz}', 15), ('{ə́}', 15), ('{ə̀}', 15), ('{ə̂}', 15), ('{ə̌}', 15), ('{ɛ́}', 15), ('{ɛ̀}', 15), ('{ɛ̂}', 15), ('{ɛ̌}', 15), ('{kp}', 15), ('{ng}', 15), ('{nk}', 15), ('{ɔ́}', 15), ('{ɔ̀}', 15), ('{ɔ̂}', 15), ('{ɔ̌}', 15), ('{ts}', 15)]

Changed Lib/gflanguages/data/languages/fil_Latn.textproto index exemplar:

  • from 43 (28 as set) to 28 elements
  • removing 15 duplicate(s):
    [('{Ng}', 15)]

Changed Lib/gflanguages/data/languages/fy_Latn.textproto base exemplar:

  • from 73 (43 as set) to 43 elements
  • removing 30 duplicate(s):
    [('{ij}', 15), ('{íj́}', 15)]

Changed Lib/gflanguages/data/languages/gd_Latn.textproto punctuation exemplar:

  • from 72 (42 as set) to 42 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/gu_Gujr.textproto index exemplar:

  • from 127 (52 as set) to 52 elements
  • removing 75 duplicate(s):
    [('{અં}', 15), ('{અઃ}', 15), ('{ક્ષ}', 15), ('{જ્ઞ}', 15), ('{ત્ર}', 15)]

Changed Lib/gflanguages/data/languages/ha_Latn.textproto punctuation exemplar:

  • from 52 (22 as set) to 22 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/hi_Deva.textproto punctuation exemplar:

  • from 49 (19 as set) to 19 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/hr_Latn.textproto index exemplar:

  • from 79 (34 as set) to 34 elements
  • removing 45 duplicate(s):
    [('{DŽ}', 15), ('{LJ}', 15), ('{NJ}', 15)]

Changed Lib/gflanguages/data/languages/hsb_Latn.textproto index exemplar:

  • from 63 (33 as set) to 33 elements
  • removing 30 duplicate(s):
    [('{DŹ}', 15), ('{CH}', 15)]

Changed Lib/gflanguages/data/languages/hu_Latn.textproto index exemplar:

  • from 179 (44 as set) to 44 elements
  • removing 135 duplicate(s):
    [('{CS}', 15), ('{DZ}', 15), ('{DZS}', 15), ('{GY}', 15), ('{LY}', 15), ('{NY}', 15), ('{SZ}', 15), ('{TY}', 15), ('{ZS}', 15)]

Changed Lib/gflanguages/data/languages/ia_Latn.textproto base exemplar:

  • from 58 (28 as set) to 28 elements
  • removing 30 duplicate(s):
    [('{ch}', 15), ('{ph}', 15)]

Changed Lib/gflanguages/data/languages/ig_Latn.textproto punctuation exemplar:

  • from 48 (18 as set) to 18 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/it_Latn.textproto punctuation exemplar:

  • from 55 (25 as set) to 25 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/ja_Jpan.textproto punctuation exemplar:

  • from 91 (62 as set) to 62 elements
  • removing 29 duplicate(s):
    [('{{', 15), ('}', 14)]

Changed Lib/gflanguages/data/languages/jgo_Latn.textproto index exemplar:

  • from 91 (31 as set) to 31 elements
  • removing 60 duplicate(s):
    [('{Pf}', 15), ('{Sh}', 15), ('{Ts}', 15), ('{Ʉ̈}', 15)]

Changed Lib/gflanguages/data/languages/ka_Geor.textproto punctuation exemplar:

  • from 67 (37 as set) to 37 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/kea_Latn.textproto auxiliary exemplar:

  • from 78 (48 as set) to 48 elements
  • removing 30 duplicate(s):
    [('{n̈}', 15), ('{rr}', 15)]

Changed Lib/gflanguages/data/languages/kk_Cyrl.textproto punctuation exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/kkj_Latn.textproto index exemplar:

  • from 253 (43 as set) to 43 elements
  • removing 210 duplicate(s):
    [('{Ɗy}', 15), ('{Gb}', 15), ('{Gw}', 15), ('{I̧}', 15), ('{Kp}', 15), ('{Kw}', 15), ('{Mb}', 15), ('{Nd}', 15), ('{Ny}', 15), ('{Ŋg}', 15), ('{Ŋgb}', 15), ('{Ŋgw}', 15), ('{Ɔ̧}', 15), ('{U̧}', 15)]

Changed Lib/gflanguages/data/languages/km_Khmr.textproto punctuation exemplar:

  • from 52 (22 as set) to 22 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/ko_Kore.textproto punctuation exemplar:

  • from 129 (84 as set) to 84 elements
  • removing 45 duplicate(s):
    [('{', 15), ('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/kok_Deva.textproto base exemplar:

  • from 206 (86 as set) to 86 elements
  • removing 120 duplicate(s):
    [('{क़}', 15), ('{ख़}', 15), ('{ग़}', 15), ('{ज़}', 15), ('{ड़}', 15), ('{ढ़}', 15), ('{फ़}', 15), ('{य़}', 15)]

Changed Lib/gflanguages/data/languages/ksf_Latn.textproto base exemplar:

  • from 81 (36 as set) to 36 elements
  • removing 45 duplicate(s):
    [('{ǝ́}', 15), ('{ɛ́}', 15), ('{ɔ́}', 15)]

Changed Lib/gflanguages/data/languages/ksh_Latn.textproto punctuation exemplar:

  • from 68 (38 as set) to 38 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/ky_Cyrl.textproto punctuation exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/lb_Latn.textproto punctuation exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/lg_Latn.textproto base exemplar:

  • from 40 (25 as set) to 25 elements
  • removing 15 duplicate(s):
    [('{ny}', 15)]

Changed Lib/gflanguages/data/languages/lkt_Latn.textproto auxiliary exemplar:

  • from 56 (11 as set) to 11 elements
  • removing 45 duplicate(s):
    [('{ȟʼ}', 15), ('{sʼ}', 15), ('{šʼ}', 15)]

Changed Lib/gflanguages/data/languages/ln_Latn.textproto index exemplar:

  • from 185 (35 as set) to 35 elements
  • removing 150 duplicate(s):
    [('{Gb}', 15), ('{Mb}', 15), ('{Mp}', 15), ('{Nd}', 15), ('{Ng}', 15), ('{Nk}', 15), ('{Ns}', 15), ('{Nt}', 15), ('{Ny}', 15), ('{Nz}', 15)]

Changed Lib/gflanguages/data/languages/lo_Laoo.textproto index exemplar:

  • from 123 (33 as set) to 33 elements
  • removing 90 duplicate(s):
    [('{ຫງ}', 15), ('{ຫຍ}', 15), ('{ຫນ}', 15), ('{ຫມ}', 15), ('{ຫລ}', 15), ('{ຫວ}', 15)]

Changed Lib/gflanguages/data/languages/lt_Latn.textproto punctuation exemplar:

  • from 50 (20 as set) to 20 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/lu_Latn.textproto base exemplar:

  • from 163 (43 as set) to 43 elements
  • removing 120 duplicate(s):
    [('{ɛ́}', 15), ('{ɛ̀}', 15), ('{ng}', 15), ('{ny}', 15), ('{ɔ́}', 15), ('{ɔ̀}', 15), ('{ph}', 15), ('{shi}', 15)]

Changed Lib/gflanguages/data/languages/mai_Deva.textproto index exemplar:

  • from 158 (53 as set) to 53 elements
  • removing 105 duplicate(s):
    [('{अं}', 15), ('{अः}', 15), ('{क्ष}', 15), ('{ज्ञ}', 15), ('{डं}', 15), ('{त्र}', 15), ('{श्र}', 15)]

Changed Lib/gflanguages/data/languages/mas_Latn.textproto base exemplar:

  • from 142 (52 as set) to 52 elements
  • removing 90 duplicate(s):
    [('{ny}', 15), ('{rr}', 15), ('{sh}', 15), ('{ʉ́}', 15), ('{wu}', 15), ('{yi}', 15)]

Changed Lib/gflanguages/data/languages/mgo_Latn.textproto index exemplar:

  • from 56 (26 as set) to 26 elements
  • removing 30 duplicate(s):
    [('{CH}', 15), ('{GH}', 15)]

Changed Lib/gflanguages/data/languages/mi_Latn.textproto base exemplar:

  • from 50 (20 as set) to 20 elements
  • removing 30 duplicate(s):
    [('{ng}', 15), ('{wh}', 15)]

Changed Lib/gflanguages/data/languages/mk_Cyrl.textproto punctuation exemplar:

  • from 52 (22 as set) to 22 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/mni_Beng.textproto base exemplar:

  • from 107 (62 as set) to 62 elements
  • removing 45 duplicate(s):
    [('{ড়}', 15), ('{ঢ়}', 15), ('{য়}', 15)]

Changed Lib/gflanguages/data/languages/mt_Latn.textproto index exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{GĦ}', 15), ('{IE*}', 15)]

Changed Lib/gflanguages/data/languages/nds_Latn.textproto punctuation exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/ne_Deva.textproto punctuation exemplar:

  • from 50 (20 as set) to 20 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/nl_Latn.textproto base exemplar:

  • from 68 (38 as set) to 38 elements
  • removing 30 duplicate(s):
    [('{ij}', 15), ('{íj́}', 15)]

Changed Lib/gflanguages/data/languages/nmg_Latn.textproto base exemplar:

  • from 245 (65 as set) to 65 elements
  • removing 180 duplicate(s):
    [('{ǝ́}', 15), ('{ǝ̂}', 15), ('{ǝ̌}', 15), ('{ǝ̄}', 15), ('{ɛ́}', 15), ('{ɛ̂}', 15), ('{ɛ̌}', 15), ('{ɛ̄}', 15), ('{ɔ́}', 15), ('{ɔ̂}', 15), ('{ɔ̌}', 15), ('{ɔ̄}', 15)]

Changed Lib/gflanguages/data/languages/nnh_Latn.textproto index exemplar:

  • from 79 (34 as set) to 34 elements
  • removing 45 duplicate(s):
    [('{Pf}', 15), ('{Sh}', 15), ('{Ts}', 15)]

Changed Lib/gflanguages/data/languages/no_Latn.textproto punctuation exemplar:

  • from 54 (24 as set) to 24 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/nus_Latn.textproto base exemplar:

  • from 178 (43 as set) to 43 elements
  • removing 135 duplicate(s):
    [('{a̱}', 15), ('{e̱}', 15), ('{ɛ̈}', 15), ('{ɛ̱}', 15), ('{ɛ̱̈}', 15), ('{i̱}', 15), ('{o̱}', 15), ('{ɔ̈}', 15), ('{ɔ̱}', 15)]

Changed Lib/gflanguages/data/languages/or_Orya.textproto index exemplar:

  • from 60 (45 as set) to 45 elements
  • removing 15 duplicate(s):
    [('{କ୍ଷ}', 15)]

Changed Lib/gflanguages/data/languages/os_Cyrl.textproto index exemplar:

  • from 176 (41 as set) to 41 elements
  • removing 135 duplicate(s):
    [('{Гъ}', 15), ('{Дж}', 15), ('{Дз}', 15), ('{Къ}', 15), ('{Пъ}', 15), ('{Тъ}', 15), ('{Хъ}', 15), ('{Цъ}', 15), ('{Чъ}', 15)]

Changed Lib/gflanguages/data/languages/pa_Guru.textproto auxiliary exemplar:

  • from 20 (5 as set) to 5 elements
  • removing 15 duplicate(s):
    [('{ਲ਼}', 15)]

Changed Lib/gflanguages/data/languages/pcm_Latn.textproto index exemplar:

  • from 39 (24 as set) to 24 elements
  • removing 15 duplicate(s):
    [('{CH}', 15)]

Changed Lib/gflanguages/data/languages/pl_Latn.textproto punctuation exemplar:

  • from 67 (37 as set) to 37 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/ps_Arab.textproto punctuation exemplar:

  • from 44 (14 as set) to 14 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/qu_Latn.textproto index exemplar:

  • from 47 (17 as set) to 17 elements
  • removing 30 duplicate(s):
    [('{Ch}', 15), ('{Ll}', 15)]

Changed Lib/gflanguages/data/languages/ru_Cyrl.textproto punctuation exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/sa_Deva.textproto punctuation exemplar:

  • from 67 (37 as set) to 37 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/sah_Cyrl.textproto index exemplar:

  • from 57 (27 as set) to 27 elements
  • removing 30 duplicate(s):
    [('{Дь}', 15), ('{Нь}', 15)]

Changed Lib/gflanguages/data/languages/sd_Arab.textproto index exemplar:

  • from 82 (52 as set) to 52 elements
  • removing 30 duplicate(s):
    [('{جھ}', 15), ('{گھ}', 15)]

Changed Lib/gflanguages/data/languages/shi_Arab.textproto base exemplar:

  • from 63 (33 as set) to 33 elements
  • removing 30 duplicate(s):
    [('{ⴳⵯ}', 15), ('{ⴽⵯ}', 15)]

Changed Lib/gflanguages/data/languages/shi_Latn.textproto index exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{Gʷ}', 15), ('{Kʷ}', 15)]

Changed Lib/gflanguages/data/languages/sk_Latn.textproto index exemplar:

  • from 50 (35 as set) to 35 elements
  • removing 15 duplicate(s):
    [('{CH}', 15)]

Changed Lib/gflanguages/data/languages/sl_Latn.textproto punctuation exemplar:

  • from 54 (24 as set) to 24 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/sq_Latn.textproto index exemplar:

  • from 171 (36 as set) to 36 elements
  • removing 135 duplicate(s):
    [('{DH}', 15), ('{GJ}', 15), ('{LL}', 15), ('{NJ}', 15), ('{RR}', 15), ('{SH}', 15), ('{TH}', 15), ('{XH}', 15), ('{ZH}', 15)]

Changed Lib/gflanguages/data/languages/sr_Cyrl.textproto punctuation exemplar:

  • from 53 (23 as set) to 23 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/sr_Latn.textproto index exemplar:

  • from 78 (33 as set) to 33 elements
  • removing 45 duplicate(s):
    [('{DŽ}', 15), ('{LJ}', 15), ('{NJ}', 15)]

Changed Lib/gflanguages/data/languages/sw_Latn.textproto index exemplar:

  • from 39 (24 as set) to 24 elements
  • removing 15 duplicate(s):
    [('{CH}', 15)]

Changed Lib/gflanguages/data/languages/te_Telu.textproto punctuation exemplar:

  • from 50 (20 as set) to 20 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/tk_Arab.textproto punctuation exemplar:

  • from 54 (24 as set) to 24 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/to_Latn.textproto index exemplar:

  • from 32 (17 as set) to 17 elements
  • removing 15 duplicate(s):
    [('{NG}', 15)]

Changed Lib/gflanguages/data/languages/tzm_Latn.textproto base exemplar:

  • from 62 (32 as set) to 32 elements
  • removing 30 duplicate(s):
    [('{gʷ}', 15), ('{kʷ}', 15)]

Changed Lib/gflanguages/data/languages/ug_Arab.textproto index exemplar:

  • from 160 (40 as set) to 40 elements
  • removing 120 duplicate(s):
    [('{ئا}', 15), ('{ئه}', 15), ('{ئو}', 15), ('{ئۇ}', 15), ('{ئۆ}', 15), ('{ئۈ}', 15), ('{ئې}', 15), ('{ئى}', 15)]

Changed Lib/gflanguages/data/languages/uk_Cyrl.textproto punctuation exemplar:

  • from 58 (28 as set) to 28 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/uz_Latn.textproto index exemplar:

  • from 88 (28 as set) to 28 elements
  • removing 60 duplicate(s):
    [('{Oʻ}', 15), ('{Gʻ}', 15), ('{Sh}', 15), ('{Ch}', 15)]

Changed Lib/gflanguages/data/languages/vai_Latn.textproto base exemplar:

  • from 105 (45 as set) to 45 elements
  • removing 60 duplicate(s):
    [('{ɛ́}', 15), ('{ɛ̃}', 15), ('{ɔ́}', 15), ('{ɔ̃}', 15)]

Changed Lib/gflanguages/data/languages/wo_Latn.textproto punctuation exemplar:

  • from 44 (14 as set) to 14 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/yav_Latn.textproto base exemplar:

  • from 156 (51 as set) to 51 elements
  • removing 105 duplicate(s):
    [('{ɛ́}', 15), ('{ɛ̀}', 15), ('{mb}', 15), ('{ny}', 15), ('{ŋg}', 15), ('{ɔ́}', 15), ('{ɔ̀}', 15)]

Changed Lib/gflanguages/data/languages/yi_Hebr.textproto base exemplar:

  • from 298 (43 as set) to 43 elements
  • removing 255 duplicate(s):
    [('{אַ}', 15), ('{אָ}', 15), ('{בֿ}', 15), ('{דזש}', 15), ('{וּ}', 15), ('{וו}', 15), ('{וי}', 15), ('{זש}', 15), ('{טש}', 15), ('{יִ}', 15), ('{יי}', 15), ('{ײַ}', 15), ('{כּ}', 15), ('{פּ}', 15), ('{פֿ}', 15), ('{שׂ}', 15), ('{תּ}', 15)]

Changed Lib/gflanguages/data/languages/yo_Latn.textproto base exemplar:

  • from 133 (43 as set) to 43 elements
  • removing 90 duplicate(s):
    [('{ẹ́}', 15), ('{ẹ̀}', 15), ('{gb}', 15), ('{m̀}', 15), ('{ọ́}', 15), ('{ọ̀}', 15)]

Changed Lib/gflanguages/data/languages/yue_Hans.textproto punctuation exemplar:

  • from 199 (124 as set) to 124 elements
  • removing 75 duplicate(s):
    [('{', 15), ('{', 15), ('﹛', 15), ('︷', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/yue_Hant.textproto punctuation exemplar:

  • from 201 (126 as set) to 126 elements
  • removing 75 duplicate(s):
    [('{', 15), ('{', 15), ('﹛', 15), ('︷', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/zgh_Tfng.textproto base exemplar:

  • from 63 (33 as set) to 33 elements
  • removing 30 duplicate(s):
    [('{ⴳⵯ}', 15), ('{ⴽⵯ}', 15)]

Changed Lib/gflanguages/data/languages/zh_Hans.textproto punctuation exemplar:

  • from 199 (124 as set) to 124 elements
  • removing 75 duplicate(s):
    [('{', 15), ('{', 15), ('﹛', 15), ('︷', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/zh_Hant.textproto punctuation exemplar:

  • from 201 (126 as set) to 126 elements
  • removing 75 duplicate(s):
    [('{', 15), ('{', 15), ('﹛', 15), ('︷', 15), ('}', 15)]

Changed Lib/gflanguages/data/languages/zu_Latn.textproto punctuation exemplar:

  • from 44 (14 as set) to 14 elements
  • removing 30 duplicate(s):
    [('{', 15), ('}', 15)]

@simoncozens
Copy link
Contributor

Please wait a second on this one. For some languages, the exemplar characters contain both pre-composed characters and decomposed character sequences. That's quite useful; see discussion on googlefonts/shaperglot#7.

@NeilSureshPatel
Copy link
Contributor

A related issue is posted here as well. googlefonts/lang/issues/32

@moyogo moyogo force-pushed the remove-duplicates-exemplar-chars branch from c0dca75 to 9e481e6 Compare October 31, 2022 16:33
@moyogo moyogo force-pushed the remove-duplicates-exemplar-chars branch from 9e481e6 to 85e8e84 Compare October 31, 2022 17:01
@moyogo
Copy link
Contributor Author

moyogo commented Oct 31, 2022

Please wait a second on this one. For some languages, the exemplar characters contain both pre-composed characters and decomposed character sequences. That's quite useful; see discussion on simoncozens/shaperglot#7.

This doesn’t touch pre-composed vs decomposed characters as exemplars are not normalized before checking for duplicates.

@moyogo moyogo merged commit 74645d9 into googlefonts:main Oct 31, 2022
@moyogo moyogo deleted the remove-duplicates-exemplar-chars branch November 11, 2022 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants