Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word splitting in non-Latin text, and over ligatures #3

Open
PhilterPaper opened this issue Dec 10, 2020 · 2 comments
Open

Word splitting in non-Latin text, and over ligatures #3

PhilterPaper opened this issue Dec 10, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@PhilterPaper
Copy link
Owner

The PDF::Builder package can typeset using HarfBuzz::Shaper to substitute a font's ligatures for sequences of lowercase letters. It does not currently natively call Text::KnuthPlass, but I plan to add this in the near future. Some potential problems arise when Harfbuzz::Shaper is used, and decides it wants to substitute some ligatures. This will mean that Text::KnuthPlass will have to accept not just plain text, but also the HarfBuzz arrays of processed glyphs, which could include ligatures. How this will interact with word-splitting (patterns and exceptions assuming no ligatures) remains to be seen. We also need to think about word-splitting with connected cursive scripts such as Arabic, and highly processed complex scripts such as Devanagari or Khmer, not to mention bi-directional (RTL) scripts, and mixtures of different types.

@PhilterPaper PhilterPaper added the enhancement New feature or request label Dec 10, 2020
@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Jan 21, 2021

Any ligatures would likely have to be backed out, at least for word-splitting purposes, and then put back in if the word wasn't split through a ligature. If it was, the fragment on either side might contain a shorter ligature, requiring HarfBuzz::Shaper to be called again, against both fragments. On the bright side, it's unlikely that decomposing a ligature into letters, or vice-versa, will change the length of the word sufficiently to require another pass with Text::KnuthPlass. It should be a small enough change that the ratio (affecting glue length) could just be updated.

he stiffly brushed aside            original text
he sti|ffl|y brushed aside          H::S decides it wants to use the 'ffl' ligature
he stiff-ly brushed aside           T::KP decides it wants to split the line between 'stiff' and 'ly'
                                    update glue sizing (ratio)
he sti|ff|-ly brushed aside         H::S now uses the 'ff' ligature
                                    readjust glue sizing

What if HarfBuzz::Shaper was called after Text::KnuthPlass? This might be feasible if ligatures are the only thing in play (no direction or alphabet changes, no font size changes, etc.). Presumably the substitution of ligatures (after the lines are already split) would just entail a small update to line ratios, to get back the desired alignment. This might not be the case for complex scripts such as Arabic or Indic languages, where glyph substitutions for various kinds of ligatures could entail substantial length changes.

@PhilterPaper
Copy link
Owner Author

Note that #2 is concerned more with word splitting on Latin text for non-English text, but still applies quite a bit to this ticket's area of interest, so be sure to look at both tickets when doing something regarding word and line splitting. Keep in mind that the only reason to worry about splitting a word is that a line needs to be split, and the best fit may be through a word (hyphenation, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant