Converter misses opportunity to detect identical glyphs, stores them as separate images #120

pavmick · 2024-08-24T06:14:38Z

As the title says. I am converting ASCII and Cyrillic ranges. The letter A, for example, is present in both ranges and it is being stored twice. Interestingly, the stored images are slightly different. Same for other identical glyphs. It should not be too troublesome to detect duplicate glyphs and store one copy only.

kisvegabor · 2024-08-30T07:29:14Z

How can we know if the ASCII A is the same as Cyrillc A? Check it on the rasterized image?

pavmick · 2024-08-30T10:29:40Z

I believe font files have facilities that allow different Unicode code points to reference the same glyph. For example, you can go to https://fontdrop.info/ , load arial.ttf, scroll down to unicode 0410 (Cyrillic letter A) click on it and observe "This composite glyph is a combination of: glyph 36". If you click on the letter A from ASCII range (close to top of table), you'll see same index 36.

kisvegabor · 2024-09-02T16:21:20Z

How many glyphs can be affected by that? I estimate it to max. 1% (but probably closer to 0.1%). What do you think?

pavmick · 2024-09-02T16:44:46Z

Let's see. For the Russian alphabet, I would say 11 uppercase and 8 lowercase letters share glyphs with ASCII. That would be 15% of ASCII range.

kisvegabor · 2024-09-04T10:01:24Z

Okay, it's really significant is this case.

So the task is to make the duplicated glyphs point to the same bitmap, right? If so, I'm okay with this feature. However I'm not a JS developer and can't work on the implementation.

Do you have time and interest to implement it?

cc @puzrin

puzrin · 2024-09-04T11:01:46Z

Guys, before discussing any changes, it's worth providing proof that the source font has multiple character codes mapped to the same image. If source images are different, that's the intent of the font authors, not a converter issue.

The TTF format has different tables for "images" and "char codes." AFAIK if an image has multiple references from char codes, the convertor should preserve them (but I'm not sure and don't remember details).

puzrin · 2024-09-04T11:33:09Z

Also worth refer binary format as base. The "lvgl" one is less optimal, focused on text representation of the source. Binary is a close subset of TTF, with minor local changes about raster/compression instead of vectors.

pavmick · 2024-09-04T12:28:50Z

So I looked closer at arial.ttf using fontdrop.info online tool. I can confirm that Russian letters АВЕМНОРТХаенорсух share glyphs with regular ASCII letters. That's 17 glyphs. This set could vary slightly from font to font, but I don't expect major variations. I am mostly an embedded C developer with some knowledge of JS. But I'll see if I can dive into the code and suggest patches.

puzrin · 2024-09-04T12:50:28Z

So I looked closer at arial.ttf using fontdrop.info online tool. I can confirm that Russian letters АВЕМНОРТХаенорсух share glyphs with regular ASCII letters. That's 17 glyphs.

And you used the same font in convertor, when found duplicated images? And the same problem in binary format?

pavmick · 2024-09-04T13:04:08Z

And you used the same font in convertor, when found duplicated images? And the same problem in binary format?

Just ran the converter on arial.ttf. Yes, the glyphs in question are duplicated. This time exact copies, to the last bit. I am not using the binary font format in my applications, so I can't confirm this behavior with it.

puzrin · 2024-09-04T13:55:05Z

There is a chance we ignored deduplication to save time. But that's 100% not internal [binary] format restriction (don't remember about lvgl).

kisvegabor · 2024-09-05T09:36:32Z

In LVGL we can also reference any bitmap_index for a glyph. See

 {.bitmap_index = 1307, .adv_w = 128, .box_w = 8, .box_h = 8, .ofs_x = 0, .ofs_y = -1},

In LVGL export format, avoid storing duplicate bitmap data for identical glyphs. Instead, reference existing bitmap data.

pavmick added a commit to pavmick/lv_font_conv that referenced this issue Sep 5, 2024

See issue lvgl#120

6ac4a4c

In LVGL export format, avoid storing duplicate bitmap data for identical glyphs. Instead, reference existing bitmap data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converter misses opportunity to detect identical glyphs, stores them as separate images #120

Converter misses opportunity to detect identical glyphs, stores them as separate images #120

pavmick commented Aug 24, 2024

kisvegabor commented Aug 30, 2024

pavmick commented Aug 30, 2024

kisvegabor commented Sep 2, 2024

pavmick commented Sep 2, 2024

kisvegabor commented Sep 4, 2024

puzrin commented Sep 4, 2024 •

edited

Loading

puzrin commented Sep 4, 2024

pavmick commented Sep 4, 2024

puzrin commented Sep 4, 2024 •

edited

Loading

pavmick commented Sep 4, 2024 •

edited

Loading

puzrin commented Sep 4, 2024

kisvegabor commented Sep 5, 2024

Converter misses opportunity to detect identical glyphs, stores them as separate images #120

Converter misses opportunity to detect identical glyphs, stores them as separate images #120

Comments

pavmick commented Aug 24, 2024

kisvegabor commented Aug 30, 2024

pavmick commented Aug 30, 2024

kisvegabor commented Sep 2, 2024

pavmick commented Sep 2, 2024

kisvegabor commented Sep 4, 2024

puzrin commented Sep 4, 2024 • edited Loading

puzrin commented Sep 4, 2024

pavmick commented Sep 4, 2024

puzrin commented Sep 4, 2024 • edited Loading

pavmick commented Sep 4, 2024 • edited Loading

puzrin commented Sep 4, 2024

kisvegabor commented Sep 5, 2024

puzrin commented Sep 4, 2024 •

edited

Loading

puzrin commented Sep 4, 2024 •

edited

Loading

pavmick commented Sep 4, 2024 •

edited

Loading