-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converter misses opportunity to detect identical glyphs, stores them as separate images #120
Comments
How can we know if the ASCII |
I believe font files have facilities that allow different Unicode code points to reference the same glyph. For example, you can go to https://fontdrop.info/ , load arial.ttf, scroll down to unicode 0410 (Cyrillic letter A) click on it and observe "This composite glyph is a combination of: glyph 36". If you click on the letter A from ASCII range (close to top of table), you'll see same index 36. |
How many glyphs can be affected by that? I estimate it to max. 1% (but probably closer to 0.1%). What do you think? |
Let's see. For the Russian alphabet, I would say 11 uppercase and 8 lowercase letters share glyphs with ASCII. That would be 15% of ASCII range. |
Okay, it's really significant is this case. So the task is to make the duplicated glyphs point to the same bitmap, right? If so, I'm okay with this feature. However I'm not a JS developer and can't work on the implementation. Do you have time and interest to implement it? cc @puzrin |
Guys, before discussing any changes, it's worth providing proof that the source font has multiple character codes mapped to the same image. If source images are different, that's the intent of the font authors, not a converter issue. The TTF format has different tables for "images" and "char codes." AFAIK if an image has multiple references from char codes, the convertor should preserve them (but I'm not sure and don't remember details). |
Also worth refer binary format as base. The "lvgl" one is less optimal, focused on text representation of the source. Binary is a close subset of TTF, with minor local changes about raster/compression instead of vectors. |
So I looked closer at arial.ttf using fontdrop.info online tool. I can confirm that Russian letters АВЕМНОРТХаенорсух share glyphs with regular ASCII letters. That's 17 glyphs. This set could vary slightly from font to font, but I don't expect major variations. I am mostly an embedded C developer with some knowledge of JS. But I'll see if I can dive into the code and suggest patches. |
And you used the same font in convertor, when found duplicated images? And the same problem in binary format? |
Just ran the converter on arial.ttf. Yes, the glyphs in question are duplicated. This time exact copies, to the last bit. I am not using the binary font format in my applications, so I can't confirm this behavior with it. |
There is a chance we ignored deduplication to save time. But that's 100% not internal [binary] format restriction (don't remember about lvgl). |
In LVGL we can also reference any {.bitmap_index = 1307, .adv_w = 128, .box_w = 8, .box_h = 8, .ofs_x = 0, .ofs_y = -1}, |
As the title says. I am converting ASCII and Cyrillic ranges. The letter A, for example, is present in both ranges and it is being stored twice. Interestingly, the stored images are slightly different. Same for other identical glyphs. It should not be too troublesome to detect duplicate glyphs and store one copy only.
The text was updated successfully, but these errors were encountered: