Skip to content
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.

Missing letters where ligatures could be (MS Word 2010 using Calibri from Windows 7) #673

Open
timretout opened this issue Sep 5, 2016 · 1 comment

Comments

@timretout
Copy link

This PDF was created with MS Word 2010 on Windows 7:
butter.pdf

Yet, after conversion by pdf2htmlEX, most (not all) of the "tt" letter combinations are not shown in the HTML output. They are present in the HTML source, but not when viewing. Here's what it looks like:
butter

By default MS Word 2010 does not enable ligatures; note that there are no ligatures in the above file. The flag "--decompose-ligature 1" has no effect on the behaviour.

The subsetted Calibri embedded in the file does not contain the glyph named "t_t.liga". However, the generated woff file still has a gsub rule copied from the embedded font. In the rendered document, all places which could be substituted with ligatures end up being rendered as an empty glyph.

Note that this bug is not so obvious when exported from MS Word 2010 on Windows 10 - in this scenario, the "t_t.liga" glyphs do get included in the subsetted font (which is strange, because they're not in fact used in the PDF). So ligatures are rendered in this case, when they ought not to be.

Vaguely related to #659, but interesting enough that I thought it deserved its own issue!

@timretout
Copy link
Author

I checked whether this was the same as #670, but that appears to require "--correct-text-visibility 1", whereas this bug does not.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant