You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The section “Defective clusters” of the USE documentation says:
When a cluster starts with any character that has UGC=Mc or UGC=Mn, USE inserts a dotted circle glyph (U+25CC) to indicate a broken cluster. Defective clusters do not form extended clusters themselves. A sequence of marks without a valid base forms separate clusters for each mark. Note that an explicit character U+25CC is a valid generic base (GB, BASE_OTHER) and so can form extended clusters.
Similarly, the section “Handling invalid combining marks” says:
Combining marks and signs that do not occur in conjunction with a valid base are considered invalid. USE treats an invalid mark as a separate cluster and displays the stand-alone mark positioned on a dotted circle (U+25CC). If multiple marks are required to position on a dotted circle, the dotted circle can be explicitly inserted into the text stream followed by any marks in accordance with the standard clustering rules.
Essentially, when the USE finds that it needs to insert a dotted circle, it also terminates the new cluster right after the mark that required the insertion.
This is quite unfortunate from a user point of view. There are many situations where the user has written a valid multi-character cluster with a dotted circle as a base, and the dotted circle then gets separated from the rest of the cluster as a side effect of script segmentation (see OpenType/opentype-layout#13) or font fallbacks. It may well happen right here for ◌꧀ꦏ or for ◌ꦺꦴ. In this situation the shaping engine should not compound the issue by inserting yet more dotted circles or by separating a virama from the consonant with which it should combine to create a conjunct form.
My proposal: When the USE finds that it needs to insert a dotted circle, it treats this inserted dotted circle as the base of a new cluster, and proceeds as if the dotted circle had been in the input character sequence.
It appears that HarfBuzz already works this way, while CoreText and DirectWrite/Uniscribe would have to be updated.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
ID: 194a6d3c-4137-46e9-3a4b-44b990200986
Version Independent ID: a0c8e788-5228-aa28-670e-3ba1ac3faecd
Since the cluster is defective, the aesthetics of the broken cluster are of lesser significance. Having extra dotted circles simply flags the issue for greater attention. I don't think this is a critical issue to drive consistency on.
USE does allow 25CC to act as a based when inserted explicitly by the user. The recent updated to override the ISC for 25CC to cosonant, will allow it to appear in subjoined forms also.
The section “Defective clusters” of the USE documentation says:
When a cluster starts with any character that has UGC=Mc or UGC=Mn, USE inserts a dotted circle glyph (U+25CC) to indicate a broken cluster. Defective clusters do not form extended clusters themselves. A sequence of marks without a valid base forms separate clusters for each mark. Note that an explicit character U+25CC is a valid generic base (GB, BASE_OTHER) and so can form extended clusters.
Similarly, the section “Handling invalid combining marks” says:
Combining marks and signs that do not occur in conjunction with a valid base are considered invalid. USE treats an invalid mark as a separate cluster and displays the stand-alone mark positioned on a dotted circle (U+25CC). If multiple marks are required to position on a dotted circle, the dotted circle can be explicitly inserted into the text stream followed by any marks in accordance with the standard clustering rules.
Essentially, when the USE finds that it needs to insert a dotted circle, it also terminates the new cluster right after the mark that required the insertion.
This is quite unfortunate from a user point of view. There are many situations where the user has written a valid multi-character cluster with a dotted circle as a base, and the dotted circle then gets separated from the rest of the cluster as a side effect of script segmentation (see OpenType/opentype-layout#13) or font fallbacks. It may well happen right here for ◌꧀ꦏ or for ◌ꦺꦴ. In this situation the shaping engine should not compound the issue by inserting yet more dotted circles or by separating a virama from the consonant with which it should combine to create a conjunct form.
My proposal: When the USE finds that it needs to insert a dotted circle, it treats this inserted dotted circle as the base of a new cluster, and proceeds as if the dotted circle had been in the input character sequence.
It appears that HarfBuzz already works this way, while CoreText and DirectWrite/Uniscribe would have to be updated.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
The text was updated successfully, but these errors were encountered: