You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
you state "The character encoding in PDF is similar to HTML, in that Unicode characters are used to represent glyphs in the document." which is incorrect. I think a better statement is something like "Unlike HTML, PDF is a fully typeset and precisely kerned page description language that specifies precise glyph selection and exact glyph positioning via glyph IDs that directly select glyphs from fonts - this is also why embedding fonts into PDF is critically important! This is also what ensures that PDFs have a consistent and reliable document display, without relying on text-shaping, text layout, or reflow implementations in each viewer. In PDF each glyph may be mapped to one or more Unicode codepoints for text extraction (if Unicode exists for each glyph). Multiple Unicode codepoints for a single glyph occur with complex typesetting such as ligatures (e.g. "ffi") and diacritics."
The text was updated successfully, but these errors were encountered:
From @petervwyatt
The text was updated successfully, but these errors were encountered: