Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use UTF16ToUTF8() ? #70

Closed
git-host-admin opened this issue Apr 22, 2018 · 3 comments · Fixed by #124
Closed

Why use UTF16ToUTF8() ? #70

git-host-admin opened this issue Apr 22, 2018 · 3 comments · Fixed by #124
Milestone

Comments

@git-host-admin
Copy link

git-host-admin commented Apr 22, 2018

Hi:

I'm from china, and there many chinese fonts. When i use getFontName() or other function like this, the return value is not valid, but if i remove the UTF16ToUTF8() call, it's the thing we want.

issue

Issue.zip

@C4MS
Copy link

C4MS commented Jan 18, 2020

Library assumes all encoding are UTF16 by default without taking in consideration the PlatformID provided

https://github.com/opentypejs/opentype.js/blob/342fac9e81a34ef08b69c5f08a0ec71727e0b832/src/tables/name.js#L644

I have overcome this issue by subclassing the name table class and overriding _parse() function

$font = \FontLib\Font::load($path);

//Replace old table
$tables = $font->getTable();
$table = new \Additions\Table\Type\nameEncoding($tables['name']);
$table->parse();
$font->setTableObject('name', $table);

$font->getFontPostscriptName(); 
class nameEncoding extends name {
   private static $header_format = array(
        "format"       => self::uint16,
        "count"        => self::uint16,
        "stringOffset" => self::uint16,
    );

    protected function _parse() {
        // override here
    }
}

@bsweeney bsweeney added this to the 0.5.5 milestone Feb 24, 2023
bsweeney added a commit that referenced this issue Dec 12, 2023
Most name strings should be encoded with UTF-16BE per the spec, but there are situations where other encodings are required or acceptable. This change only addresses a subset of potential encodings.

fixes #70
@bsweeney
Copy link
Member

bsweeney commented Dec 12, 2023

The global conversion from UTF16 was, mostly, according the spec.

Relating to platform ID 0 (Unicode):

All Unicode-based names must be in UTF-16BE (big-endian, two-byte encoding). UTF-8 and UTF-32 (one- and four-byte encodings) are not allowed.

Relating to platform ID 3 (Windows):

Encoding IDs for platform 3 'name' entries should match the encoding IDs used for platform 3 subtables in the 'cmap' table. When building a Unicode font for Windows, the platform ID should be 3 and the encoding ID should be 1. When building a symbol font for Windows, the platform ID should be 3 and the encoding ID should be 0. All string data for platform 3 must be encoded in UTF-16BE.

However, it is also true that other encodings may be used (as seen in the supplied font). While I haven't completely addressed the underlying deficiency in how the library handles string encoding, the changes implemented for the next release should be sufficient for most cases. Expanded encoding support will be built out as needed based on user feedback.

bsweeney added a commit that referenced this issue Dec 12, 2023
Most name strings should be encoded with UTF-16BE per the spec, but there are situations where other encodings are required or acceptable. This change only addresses a subset of potential encodings.

fixes #70
bsweeney added a commit that referenced this issue Dec 12, 2023
Most name strings should be encoded with UTF-16BE per the spec, but there are situations where other encodings are required or acceptable. This change only addresses a subset of potential encodings.

fixes #70
bsweeney added a commit that referenced this issue Dec 12, 2023
Most name strings should be encoded with UTF-16BE per the spec, but there are situations where other encodings are required or acceptable. This change only addresses a subset of potential encodings.

fixes #70
bsweeney added a commit that referenced this issue Dec 29, 2023
Most name strings should be encoded with UTF-16BE per the spec, but there are situations where other encodings are required or acceptable. This change only addresses a subset of potential encodings.

fixes #70
@bsweeney
Copy link
Member

bsweeney commented Dec 29, 2023

I noticed that the sample font provided uses cmap subtable format 2, which isn't yet supported. I added support for that format and improved encoding support in other areas of the library so that the next release will correctly re-encode this font.

The re-encoded font now loads correctly in browsers that do not load the original font due to spec compliance issues.

bsweeney added a commit that referenced this issue Dec 30, 2023
Most name strings should be encoded with UTF-16BE per the spec, but there are situations where other encodings are required or acceptable. This change only addresses a subset of potential encodings.

fixes #70
bsweeney added a commit that referenced this issue Jan 6, 2024
Most name strings should be encoded with UTF-16BE per the spec, but there are situations where other encodings are required or acceptable. This change only addresses a subset of potential encodings.

fixes #70
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants