Skip to content

Commit

Permalink
Editorial: Capitalize "Unicode Standard" (#2707)
Browse files Browse the repository at this point in the history
  • Loading branch information
gibson042 authored and ljharb committed Mar 28, 2022
1 parent 5805715 commit 6114646
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -6158,7 +6158,7 @@ <h1>
<p>Step <emu-xref href="#step-arc-string-check"></emu-xref> differs from step <emu-xref href="#step-binary-op-string-check"></emu-xref> in the algorithm that handles the addition operator `+` (<emu-xref href="#sec-applystringornumericbinaryoperator"></emu-xref>) by using the logical-and operation instead of the logical-or operation.</p>
</emu-note>
<emu-note>
<p>The comparison of Strings uses a simple lexicographic ordering on sequences of code unit values. There is no attempt to use the more complex, semantically oriented definitions of character or string equality and collating order defined in the Unicode specification. Therefore String values that are canonically equal according to the Unicode standard could test as unequal. In effect this algorithm assumes that both Strings are already in normalized form. Also, note that for strings containing supplementary characters, lexicographic ordering on sequences of UTF-16 code unit values differs from that on sequences of code point values.</p>
<p>The comparison of Strings uses a simple lexicographic ordering on sequences of code unit values. There is no attempt to use the more complex, semantically oriented definitions of character or string equality and collating order defined in the Unicode specification. Therefore String values that are canonically equal according to the Unicode Standard could test as unequal. In effect this algorithm assumes that both Strings are already in normalized form. Also, note that for strings containing supplementary characters, lexicographic ordering on sequences of UTF-16 code unit values differs from that on sequences of code point values.</p>
</emu-note>
</emu-clause>

Expand Down Expand Up @@ -16373,7 +16373,7 @@ <h2>Syntax</h2>

<emu-clause id="sec-names-and-keywords">
<h1>Names and Keywords</h1>
<p>|IdentifierName| and |ReservedWord| are tokens that are interpreted according to the Default Identifier Syntax given in Unicode Standard Annex #31, Identifier and Pattern Syntax, with some small modifications. |ReservedWord| is an enumerated subset of |IdentifierName|. The syntactic grammar defines |Identifier| as an |IdentifierName| that is not a |ReservedWord|. The Unicode identifier grammar is based on character properties specified by the Unicode Standard. The Unicode code points in the specified categories in the latest version of the Unicode standard must be treated as in those categories by all conforming ECMAScript implementations. ECMAScript implementations may recognize identifier code points defined in later editions of the Unicode Standard.</p>
<p>|IdentifierName| and |ReservedWord| are tokens that are interpreted according to the Default Identifier Syntax given in Unicode Standard Annex #31, Identifier and Pattern Syntax, with some small modifications. |ReservedWord| is an enumerated subset of |IdentifierName|. The syntactic grammar defines |Identifier| as an |IdentifierName| that is not a |ReservedWord|. The Unicode identifier grammar is based on character properties specified by the Unicode Standard. The Unicode code points in the specified categories in the latest version of the Unicode Standard must be treated as in those categories by all conforming ECMAScript implementations. ECMAScript implementations may recognize identifier code points defined in later editions of the Unicode Standard.</p>
<emu-note>
<p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|, and the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are permitted anywhere after the first code point of an |IdentifierName|.</p>
</emu-note>
Expand Down Expand Up @@ -16422,7 +16422,7 @@ <h2>Syntax</h2>
<emu-clause id="sec-identifier-names">
<h1>Identifier Names</h1>
<p>Unicode escape sequences are permitted in an |IdentifierName|, where they contribute a single Unicode code point to the |IdentifierName|. The code point is expressed by the |CodePoint| of the |UnicodeEscapeSequence| (see <emu-xref href="#sec-literals-string-literals"></emu-xref>). The `\\` preceding the |UnicodeEscapeSequence| and the `u` and `{ }` code units, if they appear, do not contribute code points to the |IdentifierName|. A |UnicodeEscapeSequence| cannot be used to put a code point into an |IdentifierName| that would otherwise be illegal. In other words, if a `\\` |UnicodeEscapeSequence| sequence were replaced by the |SourceCharacter| it contributes, the result must still be a valid |IdentifierName| that has the exact same sequence of |SourceCharacter| elements as the original |IdentifierName|. All interpretations of |IdentifierName| within this specification are based upon their actual code points regardless of whether or not an escape sequence was used to contribute any particular code point.</p>
<p>Two |IdentifierName|s that are canonically equivalent according to the Unicode standard are <em>not</em> equal unless, after replacement of each |UnicodeEscapeSequence|, they are represented by the exact same sequence of code points.</p>
<p>Two |IdentifierName|s that are canonically equivalent according to the Unicode Standard are <em>not</em> equal unless, after replacement of each |UnicodeEscapeSequence|, they are represented by the exact same sequence of code points.</p>

<emu-clause id="sec-identifier-names-static-semantics-early-errors">
<h1>Static Semantics: Early Errors</h1>
Expand Down Expand Up @@ -20224,7 +20224,7 @@ <h1>Runtime Semantics: Evaluation</h1>
</ul>
</emu-note>
<emu-note>
<p>Comparison of Strings uses a simple equality test on sequences of code unit values. There is no attempt to use the more complex, semantically oriented definitions of character or string equality and collating order defined in the Unicode specification. Therefore Strings values that are canonically equal according to the Unicode standard could test as unequal. In effect this algorithm assumes that both Strings are already in normalized form.</p>
<p>Comparison of Strings uses a simple equality test on sequences of code unit values. There is no attempt to use the more complex, semantically oriented definitions of character or string equality and collating order defined in the Unicode specification. Therefore Strings values that are canonically equal according to the Unicode Standard could test as unequal. In effect this algorithm assumes that both Strings are already in normalized form.</p>
</emu-note>
</emu-clause>
</emu-clause>
Expand Down Expand Up @@ -35677,7 +35677,7 @@ <h1>
<pre><code class="javascript">["baaabaac", "ba", undefined, "abaac"]</code></pre>
</emu-note>
<emu-note>
<p>In case-insignificant matches when _Unicode_ is *true*, all characters are implicitly case-folded using the simple mapping provided by the Unicode standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, `&szlig;` (U+00DF) to `SS`. It may however map a code point outside the Basic Latin range to a character within, for example, `&#x17f;` (U+017F) to `s`. Such characters are not mapped if _Unicode_ is *false*. This prevents Unicode code points such as U+017F and U+212A from matching regular expressions such as `/[a-z]/i`, but they will match `/[a-z]/ui`.</p>
<p>In case-insignificant matches when _Unicode_ is *true*, all characters are implicitly case-folded using the simple mapping provided by the Unicode Standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, `&szlig;` (U+00DF) to `SS`. It may however map a code point outside the Basic Latin range to a character within, for example, `&#x17f;` (U+017F) to `s`. Such characters are not mapped if _Unicode_ is *false*. This prevents Unicode code points such as U+017F and U+212A from matching regular expressions such as `/[a-z]/i`, but they will match `/[a-z]/ui`.</p>
</emu-note>
</emu-clause>
</emu-clause>
Expand Down Expand Up @@ -48020,7 +48020,7 @@ <h1>Additions and Changes That Introduce Incompatibilities with Prior Editions</
<p><emu-xref href="#sec-reference-record-specification-type"></emu-xref>: In ECMAScript 2015, Function calls are not allowed to return a Reference Record.</p>
<p><emu-xref href="#sec-tonumber-applied-to-the-string-type"></emu-xref>: In ECMAScript 2015, ToNumber applied to a String value now recognizes and converts |BinaryIntegerLiteral| and |OctalIntegerLiteral| numeric strings. In previous editions such strings were converted to *NaN*.</p>
<p><emu-xref href="#sec-code-realms"></emu-xref>: In ECMAScript 2018, Template objects are canonicalized based on Parse Node (source location), instead of across all occurrences of that template literal or tagged template in a Realm in previous editions.</p>
<p><emu-xref href="#sec-white-space"></emu-xref>: In ECMAScript 2016, Unicode 8.0.0 or higher is mandated, as opposed to ECMAScript 2015 which mandated Unicode 5.1. In particular, this caused U+180E MONGOLIAN VOWEL SEPARATOR, which was in the `Space_Separator` (`Zs`) category and thus treated as whitespace in ECMAScript 2015, to be moved to the `Format` (`Cf`) category (as of Unicode 6.3.0). This causes whitespace-sensitive methods to behave differently. For example, `"\u180E".trim().length` was `0` in previous editions, but `1` in ECMAScript 2016 and later. Additionally, ECMAScript 2017 mandated always using the latest version of the Unicode standard.</p>
<p><emu-xref href="#sec-white-space"></emu-xref>: In ECMAScript 2016, Unicode 8.0.0 or higher is mandated, as opposed to ECMAScript 2015 which mandated Unicode 5.1. In particular, this caused U+180E MONGOLIAN VOWEL SEPARATOR, which was in the `Space_Separator` (`Zs`) category and thus treated as whitespace in ECMAScript 2015, to be moved to the `Format` (`Cf`) category (as of Unicode 6.3.0). This causes whitespace-sensitive methods to behave differently. For example, `"\u180E".trim().length` was `0` in previous editions, but `1` in ECMAScript 2016 and later. Additionally, ECMAScript 2017 mandated always using the latest version of the Unicode Standard.</p>
<p><emu-xref href="#sec-names-and-keywords"></emu-xref>: In ECMAScript 2015, the valid code points for an |IdentifierName| are specified in terms of the Unicode properties &ldquo;ID_Start&rdquo; and &ldquo;ID_Continue&rdquo;. In previous editions, the valid |IdentifierName| or |Identifier| code points were specified by enumerating various Unicode code point categories.</p>
<p><emu-xref href="#sec-rules-of-automatic-semicolon-insertion"></emu-xref>: In ECMAScript 2015, Automatic Semicolon Insertion adds a semicolon at the end of a do-while statement if the semicolon is missing. This change aligns the specification with the actual behaviour of most existing implementations.</p>
<p><emu-xref href="#sec-object-initializer-static-semantics-early-errors"></emu-xref>: In ECMAScript 2015, it is no longer an early error to have duplicate property names in Object Initializers.</p>
Expand Down Expand Up @@ -48052,7 +48052,7 @@ <h1>Additions and Changes That Introduce Incompatibilities with Prior Editions</
<p><emu-xref href="#sec-function-instances-length"></emu-xref>: In ECMAScript 2015, the *"length"* property of function instances is configurable. In previous editions it was non-configurable.</p>
<p><emu-xref href="#sec-properties-of-the-nativeerror-constructors"></emu-xref>: In ECMAScript 2015, the [[Prototype]] internal slot of a _NativeError_ constructor is the Error constructor. In previous editions it was the Function prototype object.</p>
<p><emu-xref href="#sec-properties-of-the-date-prototype-object"></emu-xref> In ECMAScript 2015, the Date prototype object is not a Date instance. In previous editions it was a Date instance whose TimeValue was *NaN*.</p>
<p><emu-xref href="#sec-string.prototype.localecompare"></emu-xref> In ECMAScript 2015, the `String.prototype.localeCompare` function must treat Strings that are canonically equivalent according to the Unicode standard as being identical. In previous editions implementations were permitted to ignore canonical equivalence and could instead use a bit-wise comparison.</p>
<p><emu-xref href="#sec-string.prototype.localecompare"></emu-xref> In ECMAScript 2015, the `String.prototype.localeCompare` function must treat Strings that are canonically equivalent according to the Unicode Standard as being identical. In previous editions implementations were permitted to ignore canonical equivalence and could instead use a bit-wise comparison.</p>
<p><emu-xref href="#sec-string.prototype.tolowercase"></emu-xref> and <emu-xref href="#sec-string.prototype.touppercase"></emu-xref> In ECMAScript 2015, lowercase/upper conversion processing operates on code points. In previous editions such the conversion processing was only applied to individual code units. The only affected code points are those in the Deseret block of Unicode.</p>
<p><emu-xref href="#sec-string.prototype.trim"></emu-xref> In ECMAScript 2015, the `String.prototype.trim` method is defined to recognize white space code points that may exist outside of the Unicode BMP. However, as of Unicode 7 no such code points are defined. In previous editions such code points would not have been recognized as white space.</p>
<p><emu-xref href="#sec-regexp-pattern-flags"></emu-xref> In ECMAScript 2015, If the _pattern_ argument is a RegExp instance and the _flags_ argument is not *undefined*, a new RegExp instance is created just like _pattern_ except that _pattern_'s flags are replaced by the argument _flags_. In previous editions a *TypeError* exception was thrown when _pattern_ was a RegExp instance and _flags_ was not *undefined*.</p>
Expand Down

0 comments on commit 6114646

Please sign in to comment.