-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Indic] Dotted circle placement in broken syllables #76
Comments
Theoretically, a dotted circle glyph should be inserted (wherever the base glyph is expected to be; after GSUB so it does not mess with real characters, but before GPOS so mark positioning is active) for every dependent sign that is formed on its own (without an encoded base)—that is:
None of the four figures looks right.
The dotted circle should be inserted only after a dependent sign (here a repha) is formed on its own (without an encoded base).
The standalone syllable regex seems to have overlooked ZWJ’s participation in repha formation. Generally speaking, ZWJ’s effect should always be analyzed alongside a neighbor virama. I’m aware that my recommendation is different from HarfBuzz’s current practice for sequences like “ে্”. But I prefer the USE spec’s more predictable behavior of inserted dotted circles. |
Thanks heaps for your feedback, @lianghai! The example sequences are really valuable in allowing me to follow your reasoning. |
I agree HB's insertion logic is inadequate. Should jump over possible reph as well. And yes, the grammar should also allow ZWJ for explicit reph... We cannot modify the Indic grammar based on the script though. That said, it feels to me like Ra,Halant by itself is a complete syllable. No? Would be nice if we can agree on something and adjust HB as well. cc @dscorbett |
Thanks for weighing in, @behdad.
I would say so too. From observation, Uniscribe does seem to treat them as such. Using the same two examples in the original post above: (Judging by these examples and some others, I assume that Uniscribe's current dotted circle insertion strategy is that which is described in the USE spec.) |
<ra, virama> is a perfectly valid akshara on its own (and is used in real text). It should not be interfered when an inserted dotted circle (note there’s a reason why a dotted circle character is not there in the string in the first place—a base for repha is not intended to be there) can make it join following characters’s cluster. |
A note that what I said earlier—
—is likely wrong. See MicrosoftDocs/typography-issues#281 for a discussion on this matter. |
If one were to implement dotted circle insertion per USE's recommendations, before reordering but after normalisation, would double dotted-circles on decomposed matras be considered acceptable? Take, for example, <U+09CB ো BENGALI VOWEL SIGN O>, which decomposes into <U+09C7 ে BENGALI VOWEL SIGN E, U+09BE া BENGALI VOWEL SIGN AA>. If I take the spec's recommendation to "form separate clusters for each mark", this would mean I'd get <Sign E, Dotted Circle> and <Dotted Circle, Sign Aa> post-reorder. |
I think it's going to be necessary to insert an explicit discussion of the U+25CC issue. If I can finish up merging some of the remaining WIP changes I'll do that and it will be easier to judge the wording in context. |
There is some in-progress work to sort out this issue in PR #121 for those who want to take a look. Very much expect it to change; at present it's only a framework, but it does attempt to call attention to some of the issues mentioned in this thread. Mostly, I just want to know if anybody thinks it is the wrong places to start mentioning the dotted-circle insertion progress; script-specific stuff is still to come. |
On encountering a broken syllable, HarfBuzz inserts a dotted circle at the start of the syllable (or after a "Repha") and shapes it as if it were a standalone syllable.
However, on analysing the regex, we can see that a standalone syllable's dotted circle comes after a possible "Reph":
Should the dotted circle always be inserted after a possible "Reph" such that it is by definition a standalone syllable? ***
Also, when shaping syllables that begin with a possible "Reph", inserting a circle before the sequence yields output that looks a bit peculiar (to my eyes at least). Here are two examples:
"Ra, Halant, Halant, Ka" (Lohit Bengali)
Would the dotted circle in HarfBuzz's current output:
convey the missing base consonant better if it looked like this instead? (This was achieved by inserting the dotted circle after the "Ra, Halant".)
"Ra, Halant, Sign E" (Lohit Bengali)
Here, Harfbuzz marks the "Ra, Halant" sequence as post-base, resulting in the "Ra" taking on subjoined form. While this only happens with Indic1 fonts, it might be misleading:
If the dotted circle is inserted after the "Ra, Halant":
*** Where a change like this would fall short is that it doesn't handle:
REPH_MODE_EXPLICIT
scripts. Inserting a dotted circle in between a "Halant" and "ZWJ" would inhibit the formation of "Reph".REPH_MODE_IMPLICIT
scripts. A "ZWJ" in this context prohibits "Reph" formation, but by inserting the dotted circle in between the "Halant" and "ZWJ", we are permitting it.Both cases can be resolved by inserting the dotted circle after the "ZWJ", but that would mean once again deviating from the definition of a standalone syllable. Is the standalone syllable regex overly restrictive perhaps?
The text was updated successfully, but these errors were encountered: