Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sixteen quranic arabic characters #877

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Some comparison characters from Roozbeh
eggrobin committed Dec 19, 2024
commit a32fc537a2cf31d03b63b430f3778376fce7be47
Original file line number Diff line number Diff line change
@@ -21,35 +21,40 @@
end Ignoring;

Propertywise [
\N{ARABIC NORTHEAST POINTING ARROWHEAD ABOVE}
\N{ARABIC SMALL CIRCLE ABOVE}
\N{ARABIC LARGE CIRCLE ABOVE}
۠ \N{ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO}
\N{ARABIC NORTHEAST POINTING ARROWHEAD ABOVE}
\x{0657} \N{ARABIC INVERTED DAMMA}
\N{ARABIC SMALL CIRCLE ABOVE}
\N{ARABIC LARGE CIRCLE ABOVE}
\x{06E0} \N{ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO}
] AreAlike

Check failure on line 29 in unicodetools/src/main/resources/org/unicode/text/UCD/AdditionComparisons/138.txt

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Alphabetic(ٗ) = Yes ≠ No = Alphabetic(۠) Alphabetic(ٗ) = Yes ≠ No = Alphabetic(𐻋) Alphabetic(ٗ) = Yes ≠ No = Alphabetic(𐻎) Alphabetic(ٗ) = Yes ≠ No = Alphabetic(𐻏) Other_Alphabetic(ٗ) = Yes ≠ No = Other_Alphabetic(۠) Other_Alphabetic(ٗ) = Yes ≠ No = Other_Alphabetic(𐻋) Other_Alphabetic(ٗ) = Yes ≠ No = Other_Alphabetic(𐻎) Other_Alphabetic(ٗ) = Yes ≠ No = Other_Alphabetic(𐻏)

Propertywise [
\N{ARABIC SMALL HIGH NOON WITH FATHA}
\N{ARABIC SMALL HIGH NOON WITH DAMMA}
\N{ARABIC SMALL HIGH HEH INITIAL FORM}
\N{ARABIC SMALL HIGH WORD KABBIR}
ࣕ \N{ARABIC SMALL HIGH SAD} # Not like SMALL HIGH NOON, which is MCM.
ࣞ \N{ARABIC SMALL HIGH WORD QIF}
\N{ARABIC SMALL HIGH NOON WITH FATHA}
\N{ARABIC SMALL HIGH NOON WITH DAMMA}
\N{ARABIC SMALL HIGH HEH INITIAL FORM}
\N{ARABIC SMALL HIGH WORD KABBIR}
\x{06E2} ۢ \N{ARABIC SMALL HIGH MEEM ISOLATED FORM} # Not like SMALL HIGH NOON, which is MCM.
ࣞ \N{ARABIC SMALL HIGH WORD QIF}
] AreAlike

Propertywise [
\N{ARABIC NORTHEAST POINTING ARROWHEAD BELOW}
\N{ARABIC SOUTHWEST POINTING ARROWHEAD BELOW}
\x{0656} \N{ARABIC SUBSCRIPT ALEF}
] AreAlike

Check failure on line 44 in unicodetools/src/main/resources/org/unicode/text/UCD/AdditionComparisons/138.txt

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Alphabetic(ٖ) = Yes ≠ No = Alphabetic(𐻌) Alphabetic(ٖ) = Yes ≠ No = Alphabetic(𐻍) Diacritic(ٖ) = No ≠ Yes = Diacritic(𐻌) Diacritic(ٖ) = No ≠ Yes = Diacritic(𐻍) Other_Alphabetic(ٖ) = Yes ≠ No = Other_Alphabetic(𐻌) Other_Alphabetic(ٖ) = Yes ≠ No = Other_Alphabetic(𐻍)

Propertywise [
\N{ARABIC NORTHEAST POINTING ARROWHEAD BELOW}
\N{ARABIC SOUTHWEST POINTING ARROWHEAD BELOW}
\N{ARABIC SMALL LOW UPRIGHT RECTANGULAR ZERO}
\N{ARABIC SQUARE BELOW}
\N{ARABIC FILLED SQUARE BELOW}
\N{ARABIC LARGE CIRCLE BELOW}
࣑\N{ARABIC LARGE CIRCLE BELOW}
] AreAlike

Propertywise [
\N{ARABIC SMALL LOW NOON WITH FATHA}
\N{ARABIC SMALL LOW NOON WITH DAMMA}
ۣ \N{ARABIC SMALL LOW SEEN} # Not like SMALL LOW NOON WITH KASRA, which is not MCM.
\N{ARABIC SMALL LOW NOON WITH FATHA}
\N{ARABIC SMALL LOW NOON WITH DAMMA}
\x{08D3} ࣓ \N{ARABIC SMALL LOW WAW} # Not like SMALL LOW NOON WITH KASRA, which is not MCM.
] AreAlike

Check failure on line 57 in unicodetools/src/main/resources/org/unicode/text/UCD/AdditionComparisons/138.txt

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Alphabetic(࣓) = No ≠ Yes = Alphabetic(𐻴) Alphabetic(࣓) = No ≠ Yes = Alphabetic(𐻶) Other_Alphabetic(࣓) = No ≠ Yes = Other_Alphabetic(𐻴) Other_Alphabetic(࣓) = No ≠ Yes = Other_Alphabetic(𐻶)

end Ignoring;


Unchanged files with check annotations Beta

# https://www.unicode.org/reports/tr39/#Identifier_Status_and_Type
# “Unassigned characters, private use characters, surrogates, non-whitespace control characters.”
\p{Identifier_Type=Not_Character} = [\p{gc=Cn}\p{gc=Co}\p{gc=Cs}\p{gc=Cc}-\p{White_Space}]

Check notice on line 9 in unicodetools/src/main/resources/org/unicode/text/UCD/SecurityInvariantTest.txt

GitHub Actions / Check security data invariants

Invariant test failure

Expected empty, got: 4852 [\u088F\u09FF\u0B53\u0B54\u0C5C\u0CDC\u1ACF-\u1ADD\u1AE0-\u1AEB\u2B96\uA7CE\uA7CF\uA7D2\uA7D4\uA7F1\uFBC3-\uFBD2\uFD90\uFD91\uFDC8-\uFDCE\U00010940-\U0001095C\U00010EC5-\U00010EC7\U00010EC9-\U00010ED8\U00010EF0-\U00010EF8\U00010EFA\U00010EFB\U00011B60-\U00011B67\U00011DB0-\U00011DDB\U00011DE0-\U00011DE9\U00016D80-\U00016D9D\U00016DA0-\U00016DA9\U00016EA0-\U00016EB8\U00016EBB-\U00016ED3\U00016FF2-\U00016FF6\U000187F8-\U000187FF\U00018D09-\U00018D1E\U00018D80-\U00018DF2\U0001CCFA-\U0001CCFC\U0001CEBA-\U0001CED0\U0001CEE0-\U0001CEF0\U0001E6C0-\U0001E6DE\U0001E6E0-\U0001E6F5\U0001E6FE\U0001E6FF\U0001F6D8\U0001F777-\U0001F77A\U0001F8D0-\U0001F8D8\U0001FA54-\U0001FA57\U0001FA8A\U0001FA8E\U0001FAC8\U0001FACD\U0001FADD\U0001FAEA\U0001FAEF\U0001FBFA\U0002B73A-\U0002B73E\U000323B0-\U00033479] In \p{Identifier_Type=Not_Character} But Not In [\p{gc=Cn}\p{gc=Co}\p{gc=Cs}\p{gc=Cc}-\p{White_Space}] 088F # (�) ARABIC LETTER NOON WITH RING ABOVE 09FF # (�) BENGALI LETTER SANSKRIT BA 0B53..0B54 # [2] (�..�) ORIYA SIGN DOT ABOVE..ORIYA SIGN DOUBLE DOT ABOVE 0C5C # (�) TELUGU ARCHAIC SHRII 0CDC # (�) KANNADA ARCHAIC SHRII 1ACF..1ADD # [15] (�..�) COMBINING DOUBLE CARON..COMBINING DOT-AND-RING BELOW 1AE0..1AEB # [12] (�..�) COMBINING LEFT TACK ABOVE..COMBINING DOUBLE RIGHTWARDS ARROW ABOVE 2B96 # (�) EQUALS SIGN WITH INFINITY ABOVE A7CE..A7CF # [2] (�..�) LATIN CAPITAL LETTER PHARYNGEAL VOICED FRICATIVE..LATIN SMALL LETTER PHARYNGEAL VOICED FRICATIVE A7D2 # (�) LATIN CAPITAL LETTER DOUBLE THORN A7D4 # (�) LATIN CAPITAL LETTER DOUBLE WYNN A7F1 # (�) MODIFIER LETTER CAPITAL S FBC3..FBD2 # [16] (�..�) ARABIC LIGATURE JALLA WA-ALAA..ARABIC LIGATURE ALAYHI AR-RAHMAH FD90..FD91 # [2] (�..�) ARABIC LIGATURE RAHMATU ALLAAHI ALAYH..ARABIC LIGATURE RAHMATU ALLAAHI ALAYHAA FDC8..FDCE # [7] (�..�) ARABIC LIGATURE RAHIMAHU ALLAAH TAAALAA..ARABIC LIGATURE KARRAMA ALLAAHU WAJHAH 10940..1095C # [29] (�..�) SIDETIC LETTER N01..SIDETIC LETTER N29 10EC5..10EC7 # [3] (�..�) ARABIC SMALL YEH BARREE WITH TWO DOTS BELOW..ARABIC LETTER YEH WITH FOUR DOTS BELOW 10EC9..10ED8 # [16] (�..�) ARABIC SMALL BASELINE FATHA..ARABIC LIGATURE NAWWARA ALLAAHU MARQADAH 10EF0..10EF8 # [9] (�..�) ARABIC SMALL LOW UPRIGHT RECTANGULAR ZERO..ARABIC SMALL HIGH WORD KABBIR 10EFA..10EFB # [2] (�..�) ARABIC DOUBLE VERTICAL BAR BELOW..ARABIC SMALL LOW NOON 11B60..11B67 # [8] (�..�) SHARADA VOWEL SIGN OE..SHARADA VOWEL SIGN CANDRA O 11DB0..11DDB # [44] (�..�) TOLONG SIKI LETTER I..TOLONG SIKI UNGGA 11DE0..11DE9 # [10] (�..�) TOLONG SIKI DIGIT ZERO..TOLONG SIKI DIGIT NINE 16D80..16D9D # [30] (�..�) CHISOI LETTER A..CHISOI SIGN SISO 16DA0..16DA9 # [10] (�..�) CHISOI DIGIT ZERO..CHISOI DIGIT NINE 16EA0..16EB8 # [25] (�..�) BERIA ERFE CAPITAL LETTER ARKAB..BERIA ERFE CAPITAL LETTER AY 16EBB..16ED3 # [25] (�..�) BERIA ERFE SMALL LETTER ARKAB..BERIA ERFE SMALL LETTER AY 16FF2..16FF6 # [5] (�..�) CHINESE SMALL SIMPLIFIED ER..YANGQIN SIGN SLOW TWO BEATS 187F8..187FF # [8] (�..�) TANGUT IDEOGRAPH-187F8..TANGUT IDEOGRAPH-187FF 18D09..18D1E # [22] (�..�) TANGUT IDEOGRAPH-18D09..TANGUT IDEOGRAPH-18D1E 18D80..18DF2 # [115] (�..�) TANGUT COMPONENT-769..TANGUT COMPONENT-883 1CCFA..1CCFC # [3] (�..�) SNAKE SYMBOL..NOSE SYMBOL 1CEBA..1CED0 # [23] (�..�) FRAGILE SYMBOL..LEUKOTHEA 1CEE0..1CEF0 # [17] (�..�) GEOMANTIC FIGURE POPULUS..MEDIUM SMALL WHITE CIRCLE WITH HORIZONTAL BAR 1E6C0..1E6DE # [31] (�..�) TAI YO LETTER LOW KO..TAI YO LETTER HIGH KVO 1E6E0..1E6F5 # [22] (�..�) TAI YO LETTER AA..TAI YO SIGN OM 1E6FE..1E6FF # [2] (�..�) TAI YO SYMBOL MUEANG..TAI YO XAM LAI 1F6D8 # (�) LANDSLIDE 1F777..1F77A # [4] (�..�) VESTA FORM TWO..PARTHENOPE FORM TWO 1F8D0..1F8D8 # [9] (�..�) LONG RIGHTWARDS ARROW OVER LONG LEFTWARDS ARROW..LONG LEFT RIGHT ARROW WITH DEPENDENT LOBE 1FA54..1FA57 # [4] (�..�) WHITE CHESS FE
# “Multiple values are not assigned to characters with strong restrictions:
# Not_Character, Deprecated, Default_Ignorable, Not_NFKC.”
# For example, Default_Ignorable is trumped by unassigned and Deprecated.
\p{Identifier_Type=Default_Ignorable} = [\p{Default_Ignorable_Code_Point}-\p{gc=Cn}-\p{Deprecated}]
\p{Identifier_Type=Not_NFKC} = [\p{NFKC_QC=No}-\p{Deprecated}-\p{Default_Ignorable_Code_Point}]

Check notice on line 19 in unicodetools/src/main/resources/org/unicode/text/UCD/SecurityInvariantTest.txt

GitHub Actions / Check security data invariants

Invariant test failure

Expected empty, got: 1 [\uA7F1] In [\p{NFKC_QC=No}-\p{Deprecated}-\p{Default_Ignorable_Code_Point}] But Not In \p{Identifier_Type=Not_NFKC} A7F1 # (�) MODIFIER LETTER CAPITAL S
Let $Strongly_Restricted := [\p{Identifier_Type=Not_Character}\p{Identifier_Type=Deprecated}\p{Identifier_Type=Default_Ignorable}\p{Identifier_Type=Not_NFKC}]
# DerivedAge-17.0.0.txt

Check warning on line 1 in unicodetools/data/ucd/dev/DerivedAge.txt

GitHub Actions / Draft unless approved

Not in the 17.0 pipeline

While the Unicode Technical Committee has provisionally assigned these characters, they have not been accepted for Unicode 17.0, nor for any specific version of Unicode. The Age property values for new characters are likely incorrect right now. They will be recomputed after the UTC accepts their encoding and this pull request is updated for the target version.
# Date: 2024-12-19, 18:50:06 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.