update String#trim #354

lewisje · 2015-06-26T06:09:38Z

Check for two commonly erroneously trimmed characters instead of one, and check for erroneously failing to trim the trimmable whitespace characters; then use aa* instead of a+, and two regexes instead of one, both because they are more performant (it is also more performant to first trim from the beginning and then trim from the end).

I would also replace if (typeof this === 'undefined' || this === null) with if (this == null) but I don't know whether that's allowed by this library's style rules.

I would also update the the minified shim file, but I suspect that's something the maintainers instead do before every proper release.

ljharb · 2015-06-26T06:13:06Z

Thanks for the contribution!

You're correct about the style and the proper time to update minified files.

Can you also provide a test case that would fail without this change?

Also, I'd prefer to separate the bug fix PR from the performance PR, if possible.

Please also rebase out typo fix commits and force push :-)

ljharb · 2015-06-26T06:32:58Z

Can you also link to the ES6 spec and show where these characters should be included?

mathiasbynens · 2015-06-26T07:16:40Z

Can you also link to the ES6 spec and show where these characters should be included?

The spec is here: http://ecma-international.org/ecma-262/6.0/#sec-white-space

Throughout the spec, WhiteSpace consists of U+0009, U+000B, U+000C, U+0020, U+00A0, and U+FEFF + all code points in the Zs category as per Unicode v5.1.0 + any additional code points in the Zs category in the Unicode version used by the engine. In regexpu, I assume the latest available Unicode version (i.e. Unicode v8.0.0 at the time of writing) for the latter.

https://github.com/mathiasbynens/regexpu/blob/ff17a00b63a017a69fd93da455a8944eb18918ce/scripts/character-class-escape-sets.js#L72-L80

The algorithm for String.prototype.trim specifies that “the definition of white space is the union of WhiteSpace and LineTerminator”. So, the regular expression is equivalent to /\s/u in a conforming ES6 engine: https://mothereff.in/regexpu#%2F%5Cs%2Fu This, in turn, is equivalent to /[\t-\r \xA0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000\uFEFF]/. This regex is much simpler than the one that is currently used.

ljharb · 2015-06-26T07:25:53Z

So are the code points we're missing ones that were added in later Unicode versions? I'm still not clear on the problem here, but I do want to conform :-)

mathiasbynens · 2015-06-26T07:30:38Z

Looks like this patch doesn’t add any code points to the whitespace set. It only checks
'\u0085\u002B' instead of just '\u0085' for hasStringBug.

The patch description mentions these are commonly erroneously trimmed. I doubt the spec has info on that, @ljharb. Perhaps @lewisje can tell us which engines incorrectly trim \u002B?

lewisje · 2015-06-26T07:51:54Z

I don't have information on any engines that commonly erroneously trim either of those characters: I just know that this shim checks \x85 and es5-shim checks \u200B, so I thought both of them were erroneously trimmed widely enough to be checked.

Whitespace is defined in section 7.2 of the ES5 spec: http://es5.github.io/#x7.2

It includes \x09, \x0B, \x0C, \x20, \xA0, and \uFEFF and makes reference to the Unicode category "Zs": http://www.unicode.org/Public/8.0.0/ucd/PropList.txt

Both U+180E MONGOLIAN VOWEL SEPARATOR and U+200B ZERO WIDTH SPACE were re-classified from Zs (Space_Separator) to Cf (Format), while U+0085 NEXT LINE (NEL) is in Cc (Control); U+180E was moved in Unicode 6.3, U+200B was moved in Unicode 4.0.1, and U+0085 has been in Cc at least since Unicode 1.1.5.

Line terminators are defined in 7.3 as \x0A, \x0D, \u2028, and \u2029: http://es5.github.io/#x7.3

The spec says, in 15.5.4.20, that String#trim must remove "both leading and trailing white space" where "white space" means both whitespace and line terminators: http://es5.github.io/#x15.5.4.20

The ES6 spec has the same definitions for "whitespace" and "line terminator" as the ES5 spec, in Tables 32 and 33, respectively.

The ES5 spec also says that Unicode 3.0 or later is followed, which means that U+180E and U+200B may be considered non-trimmable, and also that U+0085, despite appearing in Unicode's "White_Space" list, must be considered non-trimmable, because it is not in the explicit list of whitespace or line terminators and is not in Zs: http://es5.github.io/#x2

However, the ES6 spec says that Unicode 5.1.0 or later is followed, which means that U+0085 and U+200B must be considered non-trimmable, while U+180E may be considered non-trimmable.

This means that the test for improperly trimmed characters in this shim must include both \x85 and \u200B, and in es-shims/es5-shim#316, it must include \x85 but not necessarily \u200B (it currently includes \u200B but not \x85); similarly, this shim may need to remove \u180E from the trim regex, depending on what version of Unicode it's targeting (and then add it to the improper-trim test, if a consistent API is desired).

ljharb · 2015-06-26T07:58:34Z

Sounds good - we still need an automated test to prevent your fix from being accidentally removed later.

mathiasbynens · 2015-06-26T07:59:29Z

However, the ES6 spec says that Unicode 5.1.0 or later is followed, which means that U+0085 and U+200B must be considered non-trimmable, while U+180E may be considered non-trimmable.

U+180E is Zs as per Unicode 5.1.0 so it must be considered trimmable in order to be ES6-compliant.

lewisje · 2015-06-26T08:01:01Z

Then this means this shim is specifically targeting 5.1.0, thanks for letting me know.

I'll get a test in later today, and then feel the frustration that comes with not being able to rebase from GitHub's Web-based interface.

ljharb · 2015-06-26T08:01:44Z

Thanks! Totally agree that it's lame you can't rebase on the web

mathiasbynens · 2015-06-26T08:49:29Z

Then this means this shim is specifically targeting 5.1.0, thanks for letting me know.

The ES6 spec is, and this shim aims to follow that.

Thanks for bringing this up! We might be able to simplify the regexp (see my earlier comment: #354 (comment)).

lewisje · 2015-06-26T13:44:42Z

The downside to using ranges in the regexp string is that it couldn't be re-purposed to check for failure to trim all trimmable whitespace (although it would save characters to use \t and \r and the like instead of the \x or \u escapes); also, I'm not sure we should rely on the u flag being available for regexes.

ljharb · 2015-06-26T15:45:11Z

It's ok to make multiple regexes rather than repurposing - the cost of that is negligible.

mathiasbynens · 2015-06-26T22:08:15Z

I'm not sure we should rely on the u flag being available for regexes.

I never said we should! That wouldn’t work in any existing browser. I was suggesting /[\t-\r \xA0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000\uFEFF]/ (i.e. what /\s/u translates to).

ljharb · 2015-07-20T05:33:30Z

@lewisje Are you planning on completing this PR?

lewisje · 2015-07-20T17:13:35Z

I am, but I got distracted 😞

lewisje · 2015-08-23T06:24:16Z

I made U+200B a separate test because if I combined them, and an implementation fails the "should trim on both sides" test, it could end up passing a test involving \u0085\u200B (say, if it normally trims \u200B but doesn't trim from the right to begin with).

Maybe this means, though, that in es6-shim.js I should have also tested those two characters separately instead of in the same string.

Check for two commonly erroneously trimmed characters instead of one, and check for erroneously failing to trim the trimmable whitespace characters. I would also replace `if (typeof this === 'undefined' || this === null)` with `if (this == null)` but I don't know whether that's allowed by this library's style rules.

ljharb mentioned this pull request Jun 26, 2015

Update String#trim es-shims/es5-shim#316

Open

ljharb added Spec compliance needs followup needs rebase labels Jun 26, 2015

ljharb self-assigned this Jun 26, 2015

lewisje force-pushed the string-trim-patch branch from dcbdd5d to 53beb7c Compare August 23, 2015 06:08

lewisje force-pushed the string-trim-patch branch from 53beb7c to f337db9 Compare August 23, 2015 07:47

ljharb force-pushed the master branch from a274788 to 8159480 Compare September 6, 2015 07:44

ljharb removed the needs rebase label Sep 26, 2015

Xotic750 mentioned this pull request Nov 1, 2015

Add tests to highlight trim deficiencies. #367

Closed

ljharb mentioned this pull request Nov 2, 2015

Add tests to highlight Number constructor whitspace handling deficiencies #368

Closed

ljharb added the needs rebase label Jan 3, 2016

ljharb force-pushed the master branch 3 times, most recently from 9647bf8 to 49a96e8 Compare October 24, 2023 23:28

colinrotherham mentioned this pull request Dec 11, 2023

[SPIKE] Configure Babel to add core-js polyfills alphagov/govuk-frontend#4557

Closed

colinrotherham mentioned this pull request Jul 31, 2024

Bundle size query (and 8kB version) alphagov/accessible-autocomplete#712

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update String#trim #354

update String#trim #354

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

ljharb commented Jul 20, 2015

lewisje commented Jul 20, 2015

lewisje commented Aug 23, 2015

update String#trim #354

Are you sure you want to change the base?

update String#trim #354

Conversation

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

lewisje commented Jun 26, 2015

ljharb commented Jun 26, 2015

mathiasbynens commented Jun 26, 2015

ljharb commented Jul 20, 2015

lewisje commented Jul 20, 2015

lewisje commented Aug 23, 2015