Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[USE] Inserted dotted circle should be treated as base of new cluster #289

Closed
NorbertLindenberg opened this issue Aug 23, 2019 · 2 comments

Comments

@NorbertLindenberg
Copy link

NorbertLindenberg commented Aug 23, 2019

The section “Defective clusters” of the USE documentation says:

When a cluster starts with any character that has UGC=Mc or UGC=Mn, USE inserts a dotted circle glyph (U+25CC) to indicate a broken cluster. Defective clusters do not form extended clusters themselves. A sequence of marks without a valid base forms separate clusters for each mark. Note that an explicit character U+25CC is a valid generic base (GB, BASE_OTHER) and so can form extended clusters.

Similarly, the section “Handling invalid combining marks” says:

Combining marks and signs that do not occur in conjunction with a valid base are considered invalid. USE treats an invalid mark as a separate cluster and displays the stand-alone mark positioned on a dotted circle (U+25CC). If multiple marks are required to position on a dotted circle, the dotted circle can be explicitly inserted into the text stream followed by any marks in accordance with the standard clustering rules.

Essentially, when the USE finds that it needs to insert a dotted circle, it also terminates the new cluster right after the mark that required the insertion.

This is quite unfortunate from a user point of view. There are many situations where the user has written a valid multi-character cluster with a dotted circle as a base, and the dotted circle then gets separated from the rest of the cluster as a side effect of script segmentation (see OpenType/opentype-layout#13) or font fallbacks. It may well happen right here for ◌꧀ꦏ or for ◌ꦺꦴ. In this situation the shaping engine should not compound the issue by inserting yet more dotted circles or by separating a virama from the consonant with which it should combine to create a conjunct form.

My proposal: When the USE finds that it needs to insert a dotted circle, it treats this inserted dotted circle as the base of a new cluster, and proceeds as if the dotted circle had been in the input character sequence.

It appears that HarfBuzz already works this way, while CoreText and DirectWrite/Uniscribe would have to be updated.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@NorbertLindenberg NorbertLindenberg changed the title [USE] [USE] Inserted dotted circle should be treated as base of new cluster Aug 23, 2019
@NorbertLindenberg
Copy link
Author

Correct rendering of the Javanese clusters ◌꧀ꦏ and ◌ꦺꦴ in the comment above:
Screen Shot 2019-08-23 at 14 52 46

Actual rendering in different browsers:

Safari:
Screen Shot 2019-08-23 at 14 56 42

Firefox:
firefox

Chrome:
chrome

Edge:
edge

@xadxura
Copy link
Collaborator

xadxura commented Nov 16, 2019

Since the cluster is defective, the aesthetics of the broken cluster are of lesser significance. Having extra dotted circles simply flags the issue for greater attention. I don't think this is a critical issue to drive consistency on.
USE does allow 25CC to act as a based when inserted explicitly by the user. The recent updated to override the ISC for 25CC to cosonant, will allow it to appear in subjoined forms also.

@xadxura xadxura closed this as completed Nov 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants